diff --git a/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_en_panx_en.md b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_en_panx_en.md new file mode 100644 index 00000000000000..82384ddfe6e218 --- /dev/null +++ b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_en_panx_en.md @@ -0,0 +1,76 @@ +--- +layout: model +title: Deepa Panx Model for English +author: SaiDeepaPeri +name: deepa_xlmroberta_ner_large_en_panx +date: 2024-05-06 +tags: [en, open_source] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.1.0 +spark_version: 3.0 +supported: false +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Named Entity Recognition trained on English panx + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_en_panx_en_4.1.0_3.0_1715017572119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_en_panx_en_4.1.0_3.0_1715017572119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("deepa_xlmroberta_ner_large_en_panx", "en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter() \ + .setInputCols(["document", "token", "ner"]) \ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepa_xlmroberta_ner_large_en_panx| +|Compatibility:|Spark NLP 4.1.0+| +|License:|Open Source| +|Edition:|Community| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.8 GB| +|Case sensitive:|true| +|Max sentence length:|256| \ No newline at end of file diff --git a/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_panx_dataset_en.md b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_panx_dataset_en.md new file mode 100644 index 00000000000000..8bf6d9a7b5c199 --- /dev/null +++ b/docs/_posts/SaiDeepaPeri/2024-05-06-deepa_xlmroberta_ner_large_panx_dataset_en.md @@ -0,0 +1,78 @@ +--- +layout: model +title: "Deepa NER XLMRoberta Large Model : deepa_xlmroberta_ner_large_panx" +author: SaiDeepaPeri +name: deepa_xlmroberta_ner_large_panx_dataset +date: 2024-05-06 +tags: [en, open_source] +task: Named Entity Recognition +language: en +edition: Spark NLP 4.1.0 +spark_version: 3.0 +supported: false +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +NER model XLM Roberta Large Model + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_panx_dataset_en_4.1.0_3.0_1715028210601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://community.johnsnowlabs.com/SaiDeepaPeri/deepa_xlmroberta_ner_large_panx_dataset_en_4.1.0_3.0_1715028210601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +# Create a custom Tokenizer that splits text based on spaces +tokenizer = RegexTokenizer() \ + .setInputCols(["document"]) \ + .setOutputCol("token").setPattern("\\s+") \ + +# deepa_xlmroberta_ner_large_en_panx +token_classifier = XlmRoBertaForTokenClassification.pretrained("deepa_xlmroberta_ner_large_panx", "en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter() \ + .setInputCols(["document", "token", "ner"]) \ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepa_xlmroberta_ner_large_panx_dataset| +|Compatibility:|Spark NLP 4.1.0+| +|License:|Open Source| +|Edition:|Community| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.8 GB| +|Case sensitive:|true| +|Max sentence length:|256| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md b/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md index b7efcdf8a199c4..717af5c0065301 100644 --- a/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md +++ b/docs/_posts/ahmedlone127/2024-02-11-bge_m3_xx.md @@ -68,7 +68,7 @@ val sentencerDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx .setOutputCol("sentence") val embeddings = XlmRoBertaSentenceEmbeddings - .pretrained("bge_m3", "xx") + .pretrained("bge_m3 ", "xx") .setInputCols(Array("sentence")) .setOutputCol("embeddings") diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_en.md new file mode 100644 index 00000000000000..7bcc1e9c6c2598 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_1 BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_1` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_en_5.4.0_3.0_1718060836858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_en_5.4.0_3.0_1718060836858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|381.8 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_pipeline_en.md new file mode 100644 index 00000000000000..c850ce6f447dfb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_1_pipeline pipeline BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_1_pipeline` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718060870292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718060870292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_base_english_nowr_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_base_english_nowr_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.8 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_en.md new file mode 100644 index 00000000000000..64fad700bf1058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_2 BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_2` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_en_5.4.0_3.0_1718061906872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_en_5.4.0_3.0_1718061906872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_nowr_1_2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_pipeline_en.md new file mode 100644 index 00000000000000..fdf6a00bfe0c65 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_nowr_1_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_base_english_nowr_1_2_pipeline pipeline BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_base_english_nowr_1_2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_nowr_1_2_pipeline` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_pipeline_en_5.4.0_3.0_1718061918748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_nowr_1_2_pipeline_en_5.4.0_3.0_1718061918748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_base_english_nowr_1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_base_english_nowr_1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_nowr_1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-base-en-nowr-1-2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_en.md new file mode 100644 index 00000000000000..356e6d5553886d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_base_english_v1_5_tunned_for_blender_issues BGEEmbeddings from mano-wii +author: John Snow Labs +name: baai_bge_base_english_v1_5_tunned_for_blender_issues +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_v1_5_tunned_for_blender_issues` is a English model originally trained by mano-wii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_en_5.4.0_3.0_1718061935130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_en_5.4.0_3.0_1718061935130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_v1_5_tunned_for_blender_issues","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_base_english_v1_5_tunned_for_blender_issues","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_v1_5_tunned_for_blender_issues| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|391.8 MB| + +## References + +https://huggingface.co/mano-wii/BAAI_bge-base-en-v1.5-tunned-for-blender-issues \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en.md new file mode 100644 index 00000000000000..c52f180e4f4f9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline pipeline BGEEmbeddings from mano-wii +author: John Snow Labs +name: baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline` is a English model originally trained by mano-wii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en_5.4.0_3.0_1718061967273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline_en_5.4.0_3.0_1718061967273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_base_english_v1_5_tunned_for_blender_issues_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.8 MB| + +## References + +https://huggingface.co/mano-wii/BAAI_bge-base-en-v1.5-tunned-for-blender-issues + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_en.md new file mode 100644 index 00000000000000..87a12a6a465256 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai_bge_small_english_nowr_1_1 BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_small_english_nowr_1_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_small_english_nowr_1_1` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_en_5.4.0_3.0_1718062330784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_en_5.4.0_3.0_1718062330784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai_bge_small_english_nowr_1_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai_bge_small_english_nowr_1_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_small_english_nowr_1_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-small-en-nowr-1-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_pipeline_en.md new file mode 100644 index 00000000000000..fd2c0afffa822e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-baai_bge_small_english_nowr_1_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai_bge_small_english_nowr_1_1_pipeline pipeline BGEEmbeddings from alexakkol +author: John Snow Labs +name: baai_bge_small_english_nowr_1_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai_bge_small_english_nowr_1_1_pipeline` is a English model originally trained by alexakkol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718062342271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai_bge_small_english_nowr_1_1_pipeline_en_5.4.0_3.0_1718062342271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai_bge_small_english_nowr_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai_bge_small_english_nowr_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai_bge_small_english_nowr_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/alexakkol/BAAI-bge-small-en-nowr-1-1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_en.md new file mode 100644 index 00000000000000..83f24c0862a2b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuned_reels_1_1 BGEEmbeddings from ditengm +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuned_reels_1_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuned_reels_1_1` is a English model originally trained by ditengm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_en_5.4.0_3.0_1718061343041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_en_5.4.0_3.0_1718061343041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuned_reels_1_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuned_reels_1_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuned_reels_1_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|378.9 MB| + +## References + +https://huggingface.co/ditengm/bge-base-en-v1.5-fine-tuned_reels_1.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en.md new file mode 100644 index 00000000000000..e1da2183d9b18d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline pipeline BGEEmbeddings from ditengm +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline` is a English model originally trained by ditengm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en_5.4.0_3.0_1718061383890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline_en_5.4.0_3.0_1718061383890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuned_reels_1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.9 MB| + +## References + +https://huggingface.co/ditengm/bge-base-en-v1.5-fine-tuned_reels_1.1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_en.md new file mode 100644 index 00000000000000..d84a23b6194f27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuning BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuning +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuning` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_en_5.4.0_3.0_1718060292998.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_en_5.4.0_3.0_1718060292998.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuning","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_fine_tuning","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuning| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/bespin-global/bge-base-en-v1.5-fine-tuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_pipeline_en.md new file mode 100644 index 00000000000000..cda338a6b9b79d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_fine_tuning_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_fine_tuning_pipeline pipeline BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_base_english_v1_5_fine_tuning_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_fine_tuning_pipeline` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718059920999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718059920999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_fine_tuning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_fine_tuning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_fine_tuning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.3 MB| + +## References + +https://huggingface.co/bespin-global/bge-base-en-v1.5-fine-tuning + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_en.md new file mode 100644 index 00000000000000..3d310da8b86b58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune10epochs BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune10epochs +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune10epochs` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_en_5.4.0_3.0_1718062128718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_en_5.4.0_3.0_1718062128718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune10epochs","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune10epochs","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune10epochs| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.6 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune10epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_pipeline_en.md new file mode 100644 index 00000000000000..094a68ba31216f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune10epochs_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune10epochs_pipeline pipeline BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune10epochs_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune10epochs_pipeline` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_pipeline_en_5.4.0_3.0_1718062157638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune10epochs_pipeline_en_5.4.0_3.0_1718062157638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_finetune10epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_finetune10epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune10epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.6 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune10epochs + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_en.md new file mode 100644 index 00000000000000..b203152b83eab1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_en_5.4.0_3.0_1718060651651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_en_5.4.0_3.0_1718060651651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetune","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_pipeline_en.md new file mode 100644 index 00000000000000..c54c6e2e895372 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_finetune_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetune_pipeline pipeline BGEEmbeddings from DaisyMak +author: John Snow Labs +name: bge_base_english_v1_5_finetune_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetune_pipeline` is a English model originally trained by DaisyMak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_pipeline_en_5.4.0_3.0_1718060680773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetune_pipeline_en_5.4.0_3.0_1718060680773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/DaisyMak/bge_base_en_v1.5_finetune + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_en.md new file mode 100644 index 00000000000000..b99239e443add0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_1 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_1 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_1` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_en_5.4.0_3.0_1718061548167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_en_5.4.0_3.0_1718061548167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_1","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_1","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|391.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_pipeline_en.md new file mode 100644 index 00000000000000..3c8bcaa17bf7c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_1_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_1_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_1_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_pipeline_en_5.4.0_3.0_1718061580555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_1_pipeline_en_5.4.0_3.0_1718061580555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|391.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.1 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_en.md new file mode 100644 index 00000000000000..12156067428e70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_7 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_7 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_7` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_en_5.4.0_3.0_1718062068068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_en_5.4.0_3.0_1718062068068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_7","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_7","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_7| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_pipeline_en.md new file mode 100644 index 00000000000000..5e7254e4fd493d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_7_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_7_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_7_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_7_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_pipeline_en_5.4.0_3.0_1718062096929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_7_pipeline_en_5.4.0_3.0_1718062096929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.9 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.7 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_en.md new file mode 100644 index 00000000000000..6869e5f94c99ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_9 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_9 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_9` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_en_5.4.0_3.0_1718060415847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_en_5.4.0_3.0_1718060415847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_9","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_9","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_9| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|399.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.9 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_pipeline_en.md new file mode 100644 index 00000000000000..6fd6f87601208a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_english_v1_5_ft_quora_0_9_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_9_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_9_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_9_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_pipeline_en_5.4.0_3.0_1718060444264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_9_pipeline_en_5.4.0_3.0_1718060444264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|399.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.9 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_en.md new file mode 100644 index 00000000000000..d68499300ba3aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_2 BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_2` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_en_5.4.0_3.0_1718061721095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_en_5.4.0_3.0_1718061721095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.0 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_pipeline_en.md new file mode 100644 index 00000000000000..3858f6113d10d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_2_pipeline pipeline BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_2_pipeline` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_pipeline_en_5.4.0_3.0_1718061755520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_2_pipeline_en_5.4.0_3.0_1718061755520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_en.md new file mode 100644 index 00000000000000..0d572a5404d689 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_3 BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_3 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_3` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_en_5.4.0_3.0_1718061341209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_en_5.4.0_3.0_1718061341209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_3","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_3","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_pipeline_en.md new file mode 100644 index 00000000000000..4ec7488f60d454 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_3_pipeline pipeline BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_3_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_3_pipeline` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_pipeline_en_5.4.0_3.0_1718061378108.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_3_pipeline_en_5.4.0_3.0_1718061378108.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka_3 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_en.md new file mode 100644 index 00000000000000..808ec000635b61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_andresckamilo BGEEmbeddings from Andresckamilo +author: John Snow Labs +name: bge_base_financial_matryoshka_andresckamilo +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_andresckamilo` is a English model originally trained by Andresckamilo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_en_5.4.0_3.0_1718062145384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_en_5.4.0_3.0_1718062145384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_andresckamilo","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_andresckamilo","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_andresckamilo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Andresckamilo/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_pipeline_en.md new file mode 100644 index 00000000000000..b7c266563ecf0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_andresckamilo_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_andresckamilo_pipeline pipeline BGEEmbeddings from Andresckamilo +author: John Snow Labs +name: bge_base_financial_matryoshka_andresckamilo_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_andresckamilo_pipeline` is a English model originally trained by Andresckamilo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_pipeline_en_5.4.0_3.0_1718062180077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_andresckamilo_pipeline_en_5.4.0_3.0_1718062180077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_andresckamilo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_andresckamilo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_andresckamilo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Andresckamilo/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_en.md new file mode 100644 index 00000000000000..8a84e026f64db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_gk29382231121 BGEEmbeddings from gK29382231121 +author: John Snow Labs +name: bge_base_financial_matryoshka_gk29382231121 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_gk29382231121` is a English model originally trained by gK29382231121. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_en_5.4.0_3.0_1718063444199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_en_5.4.0_3.0_1718063444199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_gk29382231121","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_gk29382231121","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_gk29382231121| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/gK29382231121/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_pipeline_en.md new file mode 100644 index 00000000000000..01a99bd4096715 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_financial_matryoshka_gk29382231121_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_gk29382231121_pipeline pipeline BGEEmbeddings from gK29382231121 +author: John Snow Labs +name: bge_base_financial_matryoshka_gk29382231121_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_gk29382231121_pipeline` is a English model originally trained by gK29382231121. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_pipeline_en_5.4.0_3.0_1718063478643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_gk29382231121_pipeline_en_5.4.0_3.0_1718063478643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_gk29382231121_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_gk29382231121_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_gk29382231121_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/gK29382231121/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_en.md new file mode 100644 index 00000000000000..495793bd9b5b4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_frombge +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_frombge_en_5.4.0_3.0_1718061477607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_frombge_en_5.4.0_3.0_1718061477607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.9 MB| + +## References + +https://huggingface.co/joshus/bge-base-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_pipeline_en.md new file mode 100644 index 00000000000000..dfacf14768b528 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_frombge_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_frombge_pipeline_en_5.4.0_3.0_1718061525807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_frombge_pipeline_en_5.4.0_3.0_1718061525807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.9 MB| + +## References + +https://huggingface.co/joshus/bge-base-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_en.md new file mode 100644 index 00000000000000..2d1287e6ae3cb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v2 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v2` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_en_5.4.0_3.0_1718061325859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_en_5.4.0_3.0_1718061325859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.4 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_pipeline_en.md new file mode 100644 index 00000000000000..1567a52029b15e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v2_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v2_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_pipeline_en_5.4.0_3.0_1718061365578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v2_pipeline_en_5.4.0_3.0_1718061365578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.4 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_en.md new file mode 100644 index 00000000000000..8eaa661bca6c21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v4 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v4 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v4` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_en_5.4.0_3.0_1718063769712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_en_5.4.0_3.0_1718063769712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v4","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v4","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_pipeline_en.md new file mode 100644 index 00000000000000..404e6a88fb916d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v4_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v4_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v4_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_pipeline_en_5.4.0_3.0_1718063809314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v4_pipeline_en_5.4.0_3.0_1718063809314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v4 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_en.md new file mode 100644 index 00000000000000..194149ad581739 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v6 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v6 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v6` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_en_5.4.0_3.0_1718061536674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_en_5.4.0_3.0_1718061536674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v6","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v6","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v6| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_pipeline_en.md new file mode 100644 index 00000000000000..97706ca1155d91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_base_securiti_dataset_1_v6_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v6_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v6_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v6_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_pipeline_en_5.4.0_3.0_1718061576401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v6_pipeline_en_5.4.0_3.0_1718061576401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v6 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_en.md new file mode 100644 index 00000000000000..014163b572c2fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_0846 BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_0846 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_0846` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_0846_en_5.4.0_3.0_1718063123051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_0846_en_5.4.0_3.0_1718063123051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_0846","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_0846","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_0846| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge_large_0846 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_pipeline_en.md new file mode 100644 index 00000000000000..f77370ec97bb1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_0846_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_0846_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_0846_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_0846_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_0846_pipeline_en_5.4.0_3.0_1718063214529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_0846_pipeline_en_5.4.0_3.0_1718063214529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_0846_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_0846_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_0846_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge_large_0846 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_en.md new file mode 100644 index 00000000000000..9c8c85ce5c4eb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_english_v1_5_semicon_ym_0122 BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_large_english_v1_5_semicon_ym_0122 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_semicon_ym_0122` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718063790188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718063790188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_semicon_ym_0122","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_semicon_ym_0122","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_semicon_ym_0122| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Niraya666/bge-large-en-v1.5-semicon-ym-0122 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_pipeline_en.md new file mode 100644 index 00000000000000..da0ffa58114412 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_english_v1_5_semicon_ym_0122_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_english_v1_5_semicon_ym_0122_pipeline pipeline BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_large_english_v1_5_semicon_ym_0122_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_semicon_ym_0122_pipeline` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718063908042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718063908042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_semicon_ym_0122_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Niraya666/bge-large-en-v1.5-semicon-ym-0122 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_en.md new file mode 100644 index 00000000000000..4c5f62f201cb3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_fine_tuned BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_en_5.4.0_3.0_1718060992320.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_en_5.4.0_3.0_1718060992320.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_pipeline_en.md new file mode 100644 index 00000000000000..dfb7734b8ace56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_large_fine_tuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_fine_tuned_pipeline pipeline BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned_pipeline` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_pipeline_en_5.4.0_3.0_1718061087284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_pipeline_en_5.4.0_3.0_1718061087284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_fine_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_fine_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_en.md new file mode 100644 index 00000000000000..1bd652e80cabd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_micro BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_en_5.4.0_3.0_1718060403391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_en_5.4.0_3.0_1718060403391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_micro","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_micro","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_pipeline_en.md new file mode 100644 index 00000000000000..14c90bfbcccb21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_micro_pipeline pipeline BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_pipeline` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_pipeline_en_5.4.0_3.0_1718060418880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_pipeline_en_5.4.0_3.0_1718060418880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_micro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_micro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_en.md new file mode 100644 index 00000000000000..7df6863e4b8462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_micro_v2_smartcomponents BGEEmbeddings from SmartComponents +author: John Snow Labs +name: bge_micro_v2_smartcomponents +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_smartcomponents` is a English model originally trained by SmartComponents. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_en_5.4.0_3.0_1718062026135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_en_5.4.0_3.0_1718062026135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_micro_v2_smartcomponents","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_micro_v2_smartcomponents","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_smartcomponents| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/SmartComponents/bge-micro-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_pipeline_en.md new file mode 100644 index 00000000000000..ec3ea424eca66e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_micro_v2_smartcomponents_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_micro_v2_smartcomponents_pipeline pipeline BGEEmbeddings from SmartComponents +author: John Snow Labs +name: bge_micro_v2_smartcomponents_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_smartcomponents_pipeline` is a English model originally trained by SmartComponents. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_pipeline_en_5.4.0_3.0_1718062041678.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_smartcomponents_pipeline_en_5.4.0_3.0_1718062041678.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_micro_v2_smartcomponents_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_micro_v2_smartcomponents_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_smartcomponents_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/SmartComponents/bge-micro-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_en.md new file mode 100644 index 00000000000000..bd1c9cd5a1b5c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english BGEEmbeddings from vectoriseai +author: John Snow Labs +name: bge_small_english +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english` is a English model originally trained by vectoriseai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_en_5.4.0_3.0_1718060625255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_en_5.4.0_3.0_1718060625255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|79.9 MB| + +## References + +https://huggingface.co/vectoriseai/bge-small-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_en.md new file mode 100644 index 00000000000000..21884d2a12f108 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english_ft BGEEmbeddings from PetroGPT +author: John Snow Labs +name: bge_small_english_ft +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_ft` is a English model originally trained by PetroGPT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_en_5.4.0_3.0_1718060820706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_en_5.4.0_3.0_1718060820706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english_ft","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english_ft","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_ft| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.9 MB| + +## References + +https://huggingface.co/PetroGPT/bge-small-en-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_pipeline_en.md new file mode 100644 index 00000000000000..4b54dbe9de1555 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_ft_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_ft_pipeline pipeline BGEEmbeddings from PetroGPT +author: John Snow Labs +name: bge_small_english_ft_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_ft_pipeline` is a English model originally trained by PetroGPT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_pipeline_en_5.4.0_3.0_1718060832479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_ft_pipeline_en_5.4.0_3.0_1718060832479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.9 MB| + +## References + +https://huggingface.co/PetroGPT/bge-small-en-ft + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_pipeline_en.md new file mode 100644 index 00000000000000..bc5fa48e8bf6e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_pipeline pipeline BGEEmbeddings from vectoriseai +author: John Snow Labs +name: bge_small_english_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_pipeline` is a English model originally trained by vectoriseai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_pipeline_en_5.4.0_3.0_1718060654155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_pipeline_en_5.4.0_3.0_1718060654155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|79.9 MB| + +## References + +https://huggingface.co/vectoriseai/bge-small-en + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_en.md new file mode 100644 index 00000000000000..a2b9f5eaf281ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_qq_qa BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_qq_qa +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_qq_qa` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_en_5.4.0_3.0_1718060259331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_en_5.4.0_3.0_1718060259331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_qq_qa","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_qq_qa","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_qq_qa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|78.1 MB| + +## References + +https://huggingface.co/svjack/bge-small-qq-qa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_pipeline_en.md new file mode 100644 index 00000000000000..6e1077ab573770 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_small_qq_qa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_qq_qa_pipeline pipeline BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_qq_qa_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_qq_qa_pipeline` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_pipeline_en_5.4.0_3.0_1718060268489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_qq_qa_pipeline_en_5.4.0_3.0_1718060268489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_qq_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_qq_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_qq_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|78.1 MB| + +## References + +https://huggingface.co/svjack/bge-small-qq-qa + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_en.md new file mode 100644 index 00000000000000..9da60cbd6bbd87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_waray_philippines BGEEmbeddings from YoungPanda +author: John Snow Labs +name: bge_waray_philippines +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_waray_philippines` is a English model originally trained by YoungPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_en_5.4.0_3.0_1718063799676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_en_5.4.0_3.0_1718063799676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_waray_philippines","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_waray_philippines","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_waray_philippines| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/YoungPanda/bge_war \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_pipeline_en.md new file mode 100644 index 00000000000000..6f2375cfc258f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-bge_waray_philippines_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_waray_philippines_pipeline pipeline BGEEmbeddings from YoungPanda +author: John Snow Labs +name: bge_waray_philippines_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_waray_philippines_pipeline` is a English model originally trained by YoungPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_pipeline_en_5.4.0_3.0_1718063901462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_waray_philippines_pipeline_en_5.4.0_3.0_1718063901462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_waray_philippines_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_waray_philippines_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_waray_philippines_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/YoungPanda/bge_war + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-embed_bge_base_edu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-embed_bge_base_edu_pipeline_en.md new file mode 100644 index 00000000000000..73f17eb4de3755 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-embed_bge_base_edu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English embed_bge_base_edu_pipeline pipeline BGEEmbeddings from HelixAI +author: John Snow Labs +name: embed_bge_base_edu_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`embed_bge_base_edu_pipeline` is a English model originally trained by HelixAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718059904353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718059904353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|embed_bge_base_edu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/HelixAI/embed_bge_base_edu + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_en.md b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_en.md new file mode 100644 index 00000000000000..26bb67d7f4db90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetuned_bge_embeddings BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_en_5.4.0_3.0_1718061952445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_en_5.4.0_3.0_1718061952445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|388.4 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned-bge-embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_pipeline_en.md new file mode 100644 index 00000000000000..a8aa8f7c5e7cc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-finetuned_bge_embeddings_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetuned_bge_embeddings_pipeline pipeline BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings_pipeline` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_pipeline_en_5.4.0_3.0_1718061985354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_pipeline_en_5.4.0_3.0_1718061985354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bge_embeddings_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bge_embeddings_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|388.4 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned-bge-embeddings + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-minebge_en.md b/docs/_posts/ahmedlone127/2024-06-10-minebge_en.md new file mode 100644 index 00000000000000..1889f40d32b123 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-minebge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English minebge BGEEmbeddings from arjunsama +author: John Snow Labs +name: minebge +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minebge` is a English model originally trained by arjunsama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minebge_en_5.4.0_3.0_1718062753734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minebge_en_5.4.0_3.0_1718062753734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("minebge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("minebge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minebge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|394.2 MB| + +## References + +https://huggingface.co/arjunsama/minebge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-minebge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-minebge_pipeline_en.md new file mode 100644 index 00000000000000..05d7237d66ae8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-minebge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English minebge_pipeline pipeline BGEEmbeddings from arjunsama +author: John Snow Labs +name: minebge_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`minebge_pipeline` is a English model originally trained by arjunsama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/minebge_pipeline_en_5.4.0_3.0_1718062786389.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/minebge_pipeline_en_5.4.0_3.0_1718062786389.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("minebge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("minebge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|minebge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|394.2 MB| + +## References + +https://huggingface.co/arjunsama/minebge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_en.md new file mode 100644 index 00000000000000..54f63dd800ed31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5v2 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5v2 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5v2` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_en_5.4.0_3.0_1718062744479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_en_5.4.0_3.0_1718062744479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en.md new file mode 100644 index 00000000000000..518df6b538396a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en_5.4.0_3.0_1718062838559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline_en_5.4.0_3.0_1718062838559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_en.md new file mode 100644 index 00000000000000..03bb3cbf245d95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_en_5.4.0_3.0_1718063691061.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_en_5.4.0_3.0_1718063691061.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|110.7 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en.md new file mode 100644 index 00000000000000..866a603eae26ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en_5.4.0_3.0_1718063703355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5_pipeline_en_5.4.0_3.0_1718063703355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|110.7 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_en.md b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_en.md new file mode 100644 index 00000000000000..d85bb9bb2b0e97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English oosinc_bge_finetune BGEEmbeddings from oosinc +author: John Snow Labs +name: oosinc_bge_finetune +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oosinc_bge_finetune` is a English model originally trained by oosinc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_en_5.4.0_3.0_1718060779820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_en_5.4.0_3.0_1718060779820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("oosinc_bge_finetune","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("oosinc_bge_finetune","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oosinc_bge_finetune| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|119.3 MB| + +## References + +https://huggingface.co/oosinc/oosinc-bge-finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_pipeline_en.md new file mode 100644 index 00000000000000..9282f10d290d50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-oosinc_bge_finetune_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English oosinc_bge_finetune_pipeline pipeline BGEEmbeddings from oosinc +author: John Snow Labs +name: oosinc_bge_finetune_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`oosinc_bge_finetune_pipeline` is a English model originally trained by oosinc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_pipeline_en_5.4.0_3.0_1718060789282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/oosinc_bge_finetune_pipeline_en_5.4.0_3.0_1718060789282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("oosinc_bge_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("oosinc_bge_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|oosinc_bge_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|119.3 MB| + +## References + +https://huggingface.co/oosinc/oosinc-bge-finetune + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_en.md new file mode 100644 index 00000000000000..198a635e2e6401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2e BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_en_5.4.0_3.0_1718063787326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_en_5.4.0_3.0_1718063787326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2e","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2e","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_pipeline_en.md new file mode 100644 index 00000000000000..8de834ed1e4ef9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2e_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_pipeline_en_5.4.0_3.0_1718063878798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_pipeline_en_5.4.0_3.0_1718063878798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_en.md new file mode 100644 index 00000000000000..183fa4335e7c17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2e_t BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e_t +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e_t` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_en_5.4.0_3.0_1718062730968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_en_5.4.0_3.0_1718062730968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2e_t","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2e_t","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e_t| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e-t \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_pipeline_en.md new file mode 100644 index 00000000000000..db0c00489c6d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_2e_t_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2e_t_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2e_t_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2e_t_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_pipeline_en_5.4.0_3.0_1718062808030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2e_t_pipeline_en_5.4.0_3.0_1718062808030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2e_t_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2e_t_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2e_t_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2e-t + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_en.md new file mode 100644 index 00000000000000..3bd9e61331eb52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_6e BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_en_5.4.0_3.0_1718061469404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_en_5.4.0_3.0_1718061469404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_6e","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_6e","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_pipeline_en.md new file mode 100644 index 00000000000000..8a6e813e61a007 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-philai_bge_6e_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_6e_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_pipeline_en_5.4.0_3.0_1718061548633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_pipeline_en_5.4.0_3.0_1718061548633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_6e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_6e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-squirtle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-squirtle_pipeline_en.md new file mode 100644 index 00000000000000..53564e60dfa83f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-squirtle_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squirtle_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: squirtle_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squirtle_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718059716161.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718059716161.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squirtle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squirtle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squirtle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|56.9 MB| + +## References + +https://huggingface.co/Mihaiii/Squirtle + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test24_en.md b/docs/_posts/ahmedlone127/2024-06-10-test24_en.md new file mode 100644 index 00000000000000..588425b9513346 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test24_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English test24 BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test24 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test24` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test24_en_5.4.0_3.0_1718061894558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test24_en_5.4.0_3.0_1718061894558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("test24","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("test24","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test24| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|64.3 MB| + +## References + +https://huggingface.co/Mihaiii/test24 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test24_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-test24_pipeline_en.md new file mode 100644 index 00000000000000..bdccfb667078aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test24_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test24_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test24_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test24_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test24_pipeline_en_5.4.0_3.0_1718061898769.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test24_pipeline_en_5.4.0_3.0_1718061898769.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test24_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test24_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test24_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|64.3 MB| + +## References + +https://huggingface.co/Mihaiii/test24 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test25_en.md b/docs/_posts/ahmedlone127/2024-06-10-test25_en.md new file mode 100644 index 00000000000000..8faf0422fa6486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test25_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English test25 BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25 +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718059799876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718059799876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("test25","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("test25","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-test25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-test25_pipeline_en.md new file mode 100644 index 00000000000000..a05397f7f7b6d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-test25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test25_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718059803927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718059803927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-testbge_en.md b/docs/_posts/ahmedlone127/2024-06-10-testbge_en.md new file mode 100644 index 00000000000000..244bf991315253 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-testbge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English testbge BGEEmbeddings from Neokun004 +author: John Snow Labs +name: testbge +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testbge` is a English model originally trained by Neokun004. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718059726765.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718059726765.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("testbge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("testbge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testbge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/Neokun004/Testbge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-wartortle_en.md b/docs/_posts/ahmedlone127/2024-06-10-wartortle_en.md new file mode 100644 index 00000000000000..2c6c36de036979 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-wartortle_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English wartortle BGEEmbeddings from Mihaiii +author: John Snow Labs +name: wartortle +date: 2024-06-10 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wartortle` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wartortle_en_5.4.0_3.0_1718060253302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wartortle_en_5.4.0_3.0_1718060253302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("wartortle","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("wartortle","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wartortle| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|63.5 MB| + +## References + +https://huggingface.co/Mihaiii/Wartortle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-10-wartortle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-10-wartortle_pipeline_en.md new file mode 100644 index 00000000000000..fb1d2e65a49db7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-10-wartortle_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English wartortle_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: wartortle_pipeline +date: 2024-06-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wartortle_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wartortle_pipeline_en_5.4.0_3.0_1718060257643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wartortle_pipeline_en_5.4.0_3.0_1718060257643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wartortle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wartortle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wartortle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|63.5 MB| + +## References + +https://huggingface.co/Mihaiii/Wartortle + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_en.md new file mode 100644 index 00000000000000..150227d1823670 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_base_finetuned_hausa_2e_3 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_hausa_2e_3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_hausa_2e_3` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_en_5.4.0_3.0_1718133900684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_en_5.4.0_3.0_1718133900684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_hausa_2e_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_finetuned_hausa_2e_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_hausa_2e_3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|415.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-hausa-2e-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_pipeline_en.md new file mode 100644 index 00000000000000..3c3ffca869378f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_finetuned_hausa_2e_3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_base_finetuned_hausa_2e_3_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_finetuned_hausa_2e_3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_finetuned_hausa_2e_3_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_pipeline_en_5.4.0_3.0_1718133928444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_finetuned_hausa_2e_3_pipeline_en_5.4.0_3.0_1718133928444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_base_finetuned_hausa_2e_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_base_finetuned_hausa_2e_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_finetuned_hausa_2e_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.4 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-finetuned-hausa-2e-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_en.md new file mode 100644 index 00000000000000..2b09386204c934 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_base_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_hausa_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_en_5.4.0_3.0_1718131983889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_en_5.4.0_3.0_1718131983889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_base_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_hausa_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|415.2 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..e9fc651caac2c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_base_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_base_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_base_hausa_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_base_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718132010249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718132010249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_base_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_base_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_base_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-base-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_en.md new file mode 100644 index 00000000000000..6014a859cb5e7d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_large_finetuned_hausa_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_finetuned_hausa_2e_4 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_finetuned_hausa_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_en_5.4.0_3.0_1718130334818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_en_5.4.0_3.0_1718130334818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_finetuned_hausa_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_finetuned_hausa_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_finetuned_hausa_2e_4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-finetuned-hausa-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..ae5541862f4711 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_finetuned_hausa_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_large_finetuned_hausa_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_finetuned_hausa_2e_4_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_finetuned_hausa_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718130365197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718130365197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_large_finetuned_hausa_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_large_finetuned_hausa_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_finetuned_hausa_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-finetuned-hausa-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_en.md new file mode 100644 index 00000000000000..12a93c143e95de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afriberta_large_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_hausa_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_en_5.4.0_3.0_1718130022818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_en_5.4.0_3.0_1718130022818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afriberta_large_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_hausa_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..9ae7f4eb760aa8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afriberta_large_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afriberta_large_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afriberta_large_hausa_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afriberta_large_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_pipeline_en_5.4.0_3.0_1718130052562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afriberta_large_hausa_5e_5_pipeline_en_5.4.0_3.0_1718130052562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afriberta_large_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afriberta_large_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afriberta_large_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/grace-pro/afriberta-large-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_en.md new file mode 100644 index 00000000000000..fc381e68be2362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_en_5.4.0_3.0_1718137940018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_en_5.4.0_3.0_1718137940018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_base_hausa_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..3ffff0e2000e8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_base_hausa_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_base_hausa_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_base_hausa_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_hausa_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718138005485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_hausa_5e_5_pipeline_en_5.4.0_3.0_1718138005485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_base_hausa_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_base_hausa_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_hausa_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-base-hausa-5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_en.md new file mode 100644 index 00000000000000..c6dae8975eeff0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_hausa_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_hausa_2e_4 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_hausa_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_en_5.4.0_3.0_1718138748756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_en_5.4.0_3.0_1718138748756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_hausa_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_hausa_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_hausa_2e_4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|443.1 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-hausa-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..b1090fd40fe20c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_hausa_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_hausa_2e_4_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_hausa_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718138776673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718138776673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_mini_finetuned_hausa_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_mini_finetuned_hausa_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_hausa_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.2 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-hausa-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_en.md new file mode 100644 index 00000000000000..ba163430a87271 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_igbo XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_igbo +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_igbo` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_en_5.4.0_3.0_1718139553148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_en_5.4.0_3.0_1718139553148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_igbo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("afro_xlmr_mini_finetuned_igbo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_igbo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|443.1 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-igbo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_pipeline_en.md new file mode 100644 index 00000000000000..73d7e223e5517d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-afro_xlmr_mini_finetuned_igbo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English afro_xlmr_mini_finetuned_igbo_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: afro_xlmr_mini_finetuned_igbo_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_mini_finetuned_igbo_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_pipeline_en_5.4.0_3.0_1718139581600.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_mini_finetuned_igbo_pipeline_en_5.4.0_3.0_1718139581600.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_mini_finetuned_igbo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_mini_finetuned_igbo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_mini_finetuned_igbo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.1 MB| + +## References + +https://huggingface.co/grace-pro/afro-xlmr-mini-finetuned-igbo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_en.md new file mode 100644 index 00000000000000..3025f5630262a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English aligned_source_5e_5 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: aligned_source_5e_5 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aligned_source_5e_5` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_en_5.4.0_3.0_1718138757574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_en_5.4.0_3.0_1718138757574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("aligned_source_5e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("aligned_source_5e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aligned_source_5e_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/aligned_source_5e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_pipeline_en.md new file mode 100644 index 00000000000000..76ad28119a6e42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-aligned_source_5e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aligned_source_5e_5_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: aligned_source_5e_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aligned_source_5e_5_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_pipeline_en_5.4.0_3.0_1718138840466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aligned_source_5e_5_pipeline_en_5.4.0_3.0_1718138840466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aligned_source_5e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aligned_source_5e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aligned_source_5e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/aligned_source_5e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_en.md new file mode 100644 index 00000000000000..187b8b394ca86d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_diacritics_shuffle_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_diacritics_shuffle_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_diacritics_shuffle_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_en_5.4.0_3.0_1718139666989.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_en_5.4.0_3.0_1718139666989.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_diacritics_shuffle_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_diacritics_shuffle_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_diacritics_shuffle_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_diacritics_shuffle_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_pipeline_en.md new file mode 100644 index 00000000000000..d349d9b7db2943 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_diacritics_shuffle_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_diacritics_shuffle_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_diacritics_shuffle_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_diacritics_shuffle_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_pipeline_en_5.4.0_3.0_1718139732623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_diacritics_shuffle_eval_pipeline_en_5.4.0_3.0_1718139732623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_diacritics_shuffle_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_diacritics_shuffle_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_diacritics_shuffle_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_diacritics_shuffle_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_en.md new file mode 100644 index 00000000000000..87d99314e563e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_punc_untranslated_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punc_untranslated_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punc_untranslated_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_en_5.4.0_3.0_1718130085204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_en_5.4.0_3.0_1718130085204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_untranslated_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punc_untranslated_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punc_untranslated_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punc_untranslated_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_pipeline_en.md new file mode 100644 index 00000000000000..f814b01a7bceba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punc_untranslated_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_punc_untranslated_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punc_untranslated_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punc_untranslated_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_pipeline_en_5.4.0_3.0_1718130151231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punc_untranslated_eval_pipeline_en_5.4.0_3.0_1718130151231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_punc_untranslated_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_punc_untranslated_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punc_untranslated_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punc_untranslated_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_en.md new file mode 100644 index 00000000000000..7359539fcd1706 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_punctuation_test XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punctuation_test +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punctuation_test` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_en_5.4.0_3.0_1718139132372.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_en_5.4.0_3.0_1718139132372.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punctuation_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_punctuation_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punctuation_test| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punctuation_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_pipeline_en.md new file mode 100644 index 00000000000000..f4911c1dfe47d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_punctuation_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_punctuation_test_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_punctuation_test_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_punctuation_test_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_pipeline_en_5.4.0_3.0_1718139197594.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_punctuation_test_pipeline_en_5.4.0_3.0_1718139197594.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_punctuation_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_punctuation_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_punctuation_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_punctuation_test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_en.md new file mode 100644 index 00000000000000..83ba08542d2bbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_shuffle_diacritics_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_diacritics_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_diacritics_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_en_5.4.0_3.0_1718138759438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_en_5.4.0_3.0_1718138759438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_diacritics_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_diacritics_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_diacritics_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_diacritics_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_pipeline_en.md new file mode 100644 index 00000000000000..3b87176ad0b099 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_diacritics_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_shuffle_diacritics_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_diacritics_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_diacritics_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_pipeline_en_5.4.0_3.0_1718138838117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_diacritics_eval_pipeline_en_5.4.0_3.0_1718138838117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_shuffle_diacritics_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_shuffle_diacritics_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_diacritics_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_diacritics_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_en.md new file mode 100644 index 00000000000000..099bbaae33bbc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_shuffle_punc_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_punc_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_punc_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_en_5.4.0_3.0_1718138230713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_en_5.4.0_3.0_1718138230713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_punc_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_shuffle_punc_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_punc_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_punc_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_pipeline_en.md new file mode 100644 index 00000000000000..94e322fef6cac9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_shuffle_punc_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_shuffle_punc_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_punc_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_punc_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_pipeline_en_5.4.0_3.0_1718138295997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_punc_eval_pipeline_en_5.4.0_3.0_1718138295997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_shuffle_punc_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_shuffle_punc_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_punc_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_punc_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_en.md new file mode 100644 index 00000000000000..e53e701628e301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_untranslated_entities_regular_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_entities_regular_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_entities_regular_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_en_5.4.0_3.0_1718139714365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_en_5.4.0_3.0_1718139714365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_entities_regular_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_entities_regular_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_entities_regular_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_entities_regular_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_pipeline_en.md new file mode 100644 index 00000000000000..5e2fc7cc286dad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_entities_regular_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_untranslated_entities_regular_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_entities_regular_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_entities_regular_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_pipeline_en_5.4.0_3.0_1718139780015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_entities_regular_eval_pipeline_en_5.4.0_3.0_1718139780015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_untranslated_entities_regular_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_untranslated_entities_regular_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_entities_regular_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_entities_regular_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_en.md new file mode 100644 index 00000000000000..b573ffa3ca9ecd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angela_untranslated_shuffle_eval XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_shuffle_eval +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_shuffle_eval` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_en_5.4.0_3.0_1718137796639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_en_5.4.0_3.0_1718137796639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_shuffle_eval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("angela_untranslated_shuffle_eval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_shuffle_eval| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_shuffle_eval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_pipeline_en.md new file mode 100644 index 00000000000000..6f89dfebd5b2fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-angela_untranslated_shuffle_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_untranslated_shuffle_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_untranslated_shuffle_eval_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_untranslated_shuffle_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_pipeline_en_5.4.0_3.0_1718137873209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_untranslated_shuffle_eval_pipeline_en_5.4.0_3.0_1718137873209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_untranslated_shuffle_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_untranslated_shuffle_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_untranslated_shuffle_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_untranslated_shuffle_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_ar.md b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_ar.md new file mode 100644 index 00000000000000..593170b6f94e0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic arabnizer_xlmr_panx_arabic XlmRoBertaForTokenClassification from mohammedaly22 +author: John Snow Labs +name: arabnizer_xlmr_panx_arabic +date: 2024-06-11 +tags: [ar, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabnizer_xlmr_panx_arabic` is a Arabic model originally trained by mohammedaly22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_ar_5.4.0_3.0_1718131128743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_ar_5.4.0_3.0_1718131128743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("arabnizer_xlmr_panx_arabic","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("arabnizer_xlmr_panx_arabic", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabnizer_xlmr_panx_arabic| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ar| +|Size:|831.3 MB| + +## References + +https://huggingface.co/mohammedaly22/arabnizer-xlmr-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_pipeline_ar.md new file mode 100644 index 00000000000000..fbcb17ed1594e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-arabnizer_xlmr_panx_arabic_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic arabnizer_xlmr_panx_arabic_pipeline pipeline XlmRoBertaForTokenClassification from mohammedaly22 +author: John Snow Labs +name: arabnizer_xlmr_panx_arabic_pipeline +date: 2024-06-11 +tags: [ar, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ar +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arabnizer_xlmr_panx_arabic_pipeline` is a Arabic model originally trained by mohammedaly22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_pipeline_ar_5.4.0_3.0_1718131237090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arabnizer_xlmr_panx_arabic_pipeline_ar_5.4.0_3.0_1718131237090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("arabnizer_xlmr_panx_arabic_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("arabnizer_xlmr_panx_arabic_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arabnizer_xlmr_panx_arabic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|831.3 MB| + +## References + +https://huggingface.co/mohammedaly22/arabnizer-xlmr-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_en.md b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_en.md new file mode 100644 index 00000000000000..0b92d00a07d491 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English baai__bge_small_english_v1_5__mozart_fine_tuned_10 BGEEmbeddings from mozart-ai +author: John Snow Labs +name: baai__bge_small_english_v1_5__mozart_fine_tuned_10 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai__bge_small_english_v1_5__mozart_fine_tuned_10` is a English model originally trained by mozart-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_en_5.4.0_3.0_1718069220842.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_en_5.4.0_3.0_1718069220842.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("baai__bge_small_english_v1_5__mozart_fine_tuned_10","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("baai__bge_small_english_v1_5__mozart_fine_tuned_10","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai__bge_small_english_v1_5__mozart_fine_tuned_10| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|115.3 MB| + +## References + +https://huggingface.co/mozart-ai/BAAI__bge-small-en-v1.5__Mozart_Fine_Tuned-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en.md new file mode 100644 index 00000000000000..e3b2ebd3d546c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline pipeline BGEEmbeddings from mozart-ai +author: John Snow Labs +name: baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline` is a English model originally trained by mozart-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en_5.4.0_3.0_1718069231479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline_en_5.4.0_3.0_1718069231479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|baai__bge_small_english_v1_5__mozart_fine_tuned_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|115.3 MB| + +## References + +https://huggingface.co/mozart-ai/BAAI__bge-small-en-v1.5__Mozart_Fine_Tuned-10 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_en.md new file mode 100644 index 00000000000000..406a30aada41cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English base_finetuned_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: base_finetuned_frombge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_finetuned_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_en_5.4.0_3.0_1718065219693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_en_5.4.0_3.0_1718065219693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("base_finetuned_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("base_finetuned_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_finetuned_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.6 MB| + +## References + +https://huggingface.co/joshus/base-finetuned-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_pipeline_en.md new file mode 100644 index 00000000000000..e40b2e28c9d383 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-base_finetuned_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English base_finetuned_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: base_finetuned_frombge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_finetuned_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065257507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065257507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_finetuned_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_finetuned_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_finetuned_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.6 MB| + +## References + +https://huggingface.co/joshus/base-finetuned-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_en.md new file mode 100644 index 00000000000000..0783809e83c42b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_0803 BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_0803 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_0803` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_0803_en_5.4.0_3.0_1718065572021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_0803_en_5.4.0_3.0_1718065572021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_0803","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_0803","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_0803| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.6 MB| + +## References + +https://huggingface.co/joshus/bge-base-0803 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_pipeline_en.md new file mode 100644 index 00000000000000..181080a01df278 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_0803_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_0803_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_base_0803_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_0803_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_0803_pipeline_en_5.4.0_3.0_1718065612145.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_0803_pipeline_en_5.4.0_3.0_1718065612145.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_0803_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_0803_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_0803_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.7 MB| + +## References + +https://huggingface.co/joshus/bge-base-0803 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_en.md new file mode 100644 index 00000000000000..2606766a049f23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_argilla_sdk_matryoshka BGEEmbeddings from plaguss +author: John Snow Labs +name: bge_base_argilla_sdk_matryoshka +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_argilla_sdk_matryoshka` is a English model originally trained by plaguss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_en_5.4.0_3.0_1718070324668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_en_5.4.0_3.0_1718070324668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_argilla_sdk_matryoshka","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_argilla_sdk_matryoshka","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_argilla_sdk_matryoshka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|377.9 MB| + +## References + +https://huggingface.co/plaguss/bge-base-argilla-sdk-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_pipeline_en.md new file mode 100644 index 00000000000000..9c9d9ae39ebcca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_argilla_sdk_matryoshka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_argilla_sdk_matryoshka_pipeline pipeline BGEEmbeddings from plaguss +author: John Snow Labs +name: bge_base_argilla_sdk_matryoshka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_argilla_sdk_matryoshka_pipeline` is a English model originally trained by plaguss. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_pipeline_en_5.4.0_3.0_1718070363620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_argilla_sdk_matryoshka_pipeline_en_5.4.0_3.0_1718070363620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_argilla_sdk_matryoshka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_argilla_sdk_matryoshka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_argilla_sdk_matryoshka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|377.9 MB| + +## References + +https://huggingface.co/plaguss/bge-base-argilla-sdk-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_en.md new file mode 100644 index 00000000000000..98d476f7a9f93c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english BGEEmbeddings from Narsil +author: John Snow Labs +name: bge_base_english +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english` is a English model originally trained by Narsil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_en_5.4.0_3.0_1718067511636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_en_5.4.0_3.0_1718067511636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|259.0 MB| + +## References + +https://huggingface.co/Narsil/bge-base-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_pipeline_en.md new file mode 100644 index 00000000000000..5150d05a9a5827 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_pipeline pipeline BGEEmbeddings from Narsil +author: John Snow Labs +name: bge_base_english_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_pipeline` is a English model originally trained by Narsil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_pipeline_en_5.4.0_3.0_1718067609835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_pipeline_en_5.4.0_3.0_1718067609835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|259.0 MB| + +## References + +https://huggingface.co/Narsil/bge-base-en + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_en.md new file mode 100644 index 00000000000000..d30a28f802dfbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetuned_300 BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_base_english_v1_5_finetuned_300 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetuned_300` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_en_5.4.0_3.0_1718064744487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_en_5.4.0_3.0_1718064744487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetuned_300","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_finetuned_300","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetuned_300| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|384.8 MB| + +## References + +https://huggingface.co/ramnathv/bge-base-en-v1.5-finetuned-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_pipeline_en.md new file mode 100644 index 00000000000000..08638750898f10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_finetuned_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_finetuned_300_pipeline pipeline BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_base_english_v1_5_finetuned_300_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_finetuned_300_pipeline` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718064776631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718064776631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_finetuned_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_finetuned_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_finetuned_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.8 MB| + +## References + +https://huggingface.co/ramnathv/bge-base-en-v1.5-finetuned-300 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_en.md new file mode 100644 index 00000000000000..53f523107806c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_3 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_3 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_3` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_en_5.4.0_3.0_1718065501803.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_en_5.4.0_3.0_1718065501803.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_3","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_3","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|396.1 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_pipeline_en.md new file mode 100644 index 00000000000000..b70cc7fb5614eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_3_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_3_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_pipeline_en_5.4.0_3.0_1718065530886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_3_pipeline_en_5.4.0_3.0_1718065530886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|396.1 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.3 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_en.md new file mode 100644 index 00000000000000..c436bf742247b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_5 BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_5` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_en_5.4.0_3.0_1718067747603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_en_5.4.0_3.0_1718067747603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_0_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|398.0 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_pipeline_en.md new file mode 100644 index 00000000000000..6239d621717779 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_0_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_0_5_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_0_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_0_5_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_pipeline_en_5.4.0_3.0_1718067776480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_0_5_pipeline_en_5.4.0_3.0_1718067776480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_0_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_0_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|398.0 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora-0.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_en.md new file mode 100644 index 00000000000000..63a0976fb25615 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_567_labs BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_567_labs +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_567_labs` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_en_5.4.0_3.0_1718064656500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_en_5.4.0_3.0_1718064656500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_567_labs","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_567_labs","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_567_labs| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_pipeline_en.md new file mode 100644 index 00000000000000..cbecc2ac583324 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_567_labs_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_567_labs_pipeline pipeline BGEEmbeddings from 567-labs +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_567_labs_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_567_labs_pipeline` is a English model originally trained by 567-labs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_pipeline_en_5.4.0_3.0_1718064684009.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_567_labs_pipeline_en_5.4.0_3.0_1718064684009.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_567_labs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_567_labs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_567_labs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/567-labs/bge-base-en-v1.5-ft-quora + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_en.md new file mode 100644 index 00000000000000..2a5c085f849dc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_krunchykat BGEEmbeddings from krunchykat +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_krunchykat +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_krunchykat` is a English model originally trained by krunchykat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_en_5.4.0_3.0_1718069916603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_en_5.4.0_3.0_1718069916603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_krunchykat","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_ft_quora_krunchykat","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_krunchykat| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/krunchykat/bge-base-en-v1.5-ft-quora \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en.md new file mode 100644 index 00000000000000..7ca476a511f305 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_ft_quora_krunchykat_pipeline pipeline BGEEmbeddings from krunchykat +author: John Snow Labs +name: bge_base_english_v1_5_ft_quora_krunchykat_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_ft_quora_krunchykat_pipeline` is a English model originally trained by krunchykat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en_5.4.0_3.0_1718069950790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_ft_quora_krunchykat_pipeline_en_5.4.0_3.0_1718069950790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_ft_quora_krunchykat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_ft_quora_krunchykat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_ft_quora_krunchykat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|400.5 MB| + +## References + +https://huggingface.co/krunchykat/bge-base-en-v1.5-ft-quora + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_en.md new file mode 100644 index 00000000000000..9774e57c56b2f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_v1_5_semicon_ym_0122 BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_base_english_v1_5_semicon_ym_0122 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_semicon_ym_0122` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718069614829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_en_5.4.0_3.0_1718069614829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_semicon_ym_0122","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_v1_5_semicon_ym_0122","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_semicon_ym_0122| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|380.6 MB| + +## References + +https://huggingface.co/Niraya666/bge-base-en-v1.5-semicon-ym-0122 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_pipeline_en.md new file mode 100644 index 00000000000000..611495211ace7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_english_v1_5_semicon_ym_0122_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_v1_5_semicon_ym_0122_pipeline pipeline BGEEmbeddings from Niraya666 +author: John Snow Labs +name: bge_base_english_v1_5_semicon_ym_0122_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_v1_5_semicon_ym_0122_pipeline` is a English model originally trained by Niraya666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718069652500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_v1_5_semicon_ym_0122_pipeline_en_5.4.0_3.0_1718069652500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_v1_5_semicon_ym_0122_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_v1_5_semicon_ym_0122_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|380.6 MB| + +## References + +https://huggingface.co/Niraya666/bge-base-en-v1.5-semicon-ym-0122 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_en.md new file mode 100644 index 00000000000000..5bf2e7eb5673c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial BGEEmbeddings from riphunter7001x +author: John Snow Labs +name: bge_base_financial +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial` is a English model originally trained by riphunter7001x. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_en_5.4.0_3.0_1718071167837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_en_5.4.0_3.0_1718071167837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.5 MB| + +## References + +https://huggingface.co/riphunter7001x/bge-base-financial \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_en.md new file mode 100644 index 00000000000000..0a5e6f303f94e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_mugheesawan11 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_financial_matryoshka_mugheesawan11 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_mugheesawan11` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_en_5.4.0_3.0_1718068377569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_en_5.4.0_3.0_1718068377569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_mugheesawan11","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_mugheesawan11","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_mugheesawan11| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.0 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_pipeline_en.md new file mode 100644 index 00000000000000..8b8d3a3a789389 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_mugheesawan11_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_mugheesawan11_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_financial_matryoshka_mugheesawan11_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_mugheesawan11_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_pipeline_en_5.4.0_3.0_1718068413353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_mugheesawan11_pipeline_en_5.4.0_3.0_1718068413353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_mugheesawan11_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_mugheesawan11_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_mugheesawan11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_en.md new file mode 100644 index 00000000000000..ef5c7514f6b225 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_phamkinhquoc2002 BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_phamkinhquoc2002 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_phamkinhquoc2002` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_en_5.4.0_3.0_1718066241417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_en_5.4.0_3.0_1718066241417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_phamkinhquoc2002","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_phamkinhquoc2002","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_phamkinhquoc2002| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en.md new file mode 100644 index 00000000000000..25411e24a51fe9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_phamkinhquoc2002_pipeline pipeline BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_phamkinhquoc2002_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_phamkinhquoc2002_pipeline` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en_5.4.0_3.0_1718066340084.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_phamkinhquoc2002_pipeline_en_5.4.0_3.0_1718066340084.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_phamkinhquoc2002_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_phamkinhquoc2002_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_phamkinhquoc2002_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_en.md new file mode 100644 index 00000000000000..8986d33677907e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_philschmid BGEEmbeddings from philschmid +author: John Snow Labs +name: bge_base_financial_matryoshka_philschmid +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_philschmid` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_en_5.4.0_3.0_1718066188278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_en_5.4.0_3.0_1718066188278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_philschmid","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_philschmid","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_philschmid| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/philschmid/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_pipeline_en.md new file mode 100644 index 00000000000000..62703a59100de4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_philschmid_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_philschmid_pipeline pipeline BGEEmbeddings from philschmid +author: John Snow Labs +name: bge_base_financial_matryoshka_philschmid_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_philschmid_pipeline` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_pipeline_en_5.4.0_3.0_1718066222589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_philschmid_pipeline_en_5.4.0_3.0_1718066222589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_philschmid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_philschmid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_philschmid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/philschmid/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_en.md new file mode 100644 index 00000000000000..fbc56f1811476a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_sailesh9999 BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_sailesh9999 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_sailesh9999` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_en_5.4.0_3.0_1718066452795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_en_5.4.0_3.0_1718066452795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_sailesh9999","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_sailesh9999","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_sailesh9999| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_pipeline_en.md new file mode 100644 index 00000000000000..1bc5b18b5d337f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_sailesh9999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_sailesh9999_pipeline pipeline BGEEmbeddings from Sailesh9999 +author: John Snow Labs +name: bge_base_financial_matryoshka_sailesh9999_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_sailesh9999_pipeline` is a English model originally trained by Sailesh9999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_pipeline_en_5.4.0_3.0_1718066493968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_sailesh9999_pipeline_en_5.4.0_3.0_1718066493968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_sailesh9999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_sailesh9999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_sailesh9999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Sailesh9999/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_en.md new file mode 100644 index 00000000000000..a9c0f4d93566b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_test BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_test +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_test` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_en_5.4.0_3.0_1718068288916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_en_5.4.0_3.0_1718068288916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_test","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_test","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_test| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_pipeline_en.md new file mode 100644 index 00000000000000..762a6ce99ca92a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_test_pipeline pipeline BGEEmbeddings from phamkinhquoc2002 +author: John Snow Labs +name: bge_base_financial_matryoshka_test_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_test_pipeline` is a English model originally trained by phamkinhquoc2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_pipeline_en_5.4.0_3.0_1718068388194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_test_pipeline_en_5.4.0_3.0_1718068388194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|256.0 MB| + +## References + +https://huggingface.co/phamkinhquoc2002/bge-base-financial-matryoshka_test + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_en.md new file mode 100644 index 00000000000000..924bb44192a483 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_uonyeka BGEEmbeddings from uonyeka +author: John Snow Labs +name: bge_base_financial_matryoshka_uonyeka +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_uonyeka` is a English model originally trained by uonyeka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_en_5.4.0_3.0_1718064614997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_en_5.4.0_3.0_1718064614997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_uonyeka","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_uonyeka","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_uonyeka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/uonyeka/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_pipeline_en.md new file mode 100644 index 00000000000000..36d6b4838499ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_uonyeka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_uonyeka_pipeline pipeline BGEEmbeddings from uonyeka +author: John Snow Labs +name: bge_base_financial_matryoshka_uonyeka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_uonyeka_pipeline` is a English model originally trained by uonyeka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_pipeline_en_5.4.0_3.0_1718064649259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_uonyeka_pipeline_en_5.4.0_3.0_1718064649259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_uonyeka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_uonyeka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_uonyeka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.2 MB| + +## References + +https://huggingface.co/uonyeka/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_en.md new file mode 100644 index 00000000000000..b23dce53c3e121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_waheedlone BGEEmbeddings from WaheedLone +author: John Snow Labs +name: bge_base_financial_matryoshka_waheedlone +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_waheedlone` is a English model originally trained by WaheedLone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_en_5.4.0_3.0_1718068545032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_en_5.4.0_3.0_1718068545032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_waheedlone","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_waheedlone","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_waheedlone| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/WaheedLone/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_pipeline_en.md new file mode 100644 index 00000000000000..42e4c6cdc454f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_matryoshka_waheedlone_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_waheedlone_pipeline pipeline BGEEmbeddings from WaheedLone +author: John Snow Labs +name: bge_base_financial_matryoshka_waheedlone_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_waheedlone_pipeline` is a English model originally trained by WaheedLone. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_pipeline_en_5.4.0_3.0_1718068579611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_waheedlone_pipeline_en_5.4.0_3.0_1718068579611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_waheedlone_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_waheedlone_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_waheedlone_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/WaheedLone/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_pipeline_en.md new file mode 100644 index 00000000000000..30d93e721ec776 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_financial_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_pipeline pipeline BGEEmbeddings from riphunter7001x +author: John Snow Labs +name: bge_base_financial_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_pipeline` is a English model originally trained by riphunter7001x. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_pipeline_en_5.4.0_3.0_1718071202731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_pipeline_en_5.4.0_3.0_1718071202731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.5 MB| + +## References + +https://huggingface.co/riphunter7001x/bge-base-financial + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_en.md new file mode 100644 index 00000000000000..53398215ae0f70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_finetune_v2 BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetune_v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetune_v2` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_en_5.4.0_3.0_1718066856172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_en_5.4.0_3.0_1718066856172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_finetune_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_finetune_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetune_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|385.4 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetune-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_pipeline_en.md new file mode 100644 index 00000000000000..c49798537b71c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetune_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_finetune_v2_pipeline pipeline BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetune_v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetune_v2_pipeline` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_pipeline_en_5.4.0_3.0_1718066889529.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetune_v2_pipeline_en_5.4.0_3.0_1718066889529.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_finetune_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_finetune_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetune_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|385.4 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetune-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_en.md new file mode 100644 index 00000000000000..80980ea4ade301 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_finetuned BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetuned +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_en_5.4.0_3.0_1718067158517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_en_5.4.0_3.0_1718067158517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_finetuned","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_finetuned","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|381.7 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_en.md new file mode 100644 index 00000000000000..41154bdff09b71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_finetuned_financial BGEEmbeddings from Nishanth7803 +author: John Snow Labs +name: bge_base_finetuned_financial +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned_financial` is a English model originally trained by Nishanth7803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_en_5.4.0_3.0_1718092788141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_en_5.4.0_3.0_1718092788141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_finetuned_financial","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_finetuned_financial","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned_financial| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Nishanth7803/bge-base-finetuned-financial \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_pipeline_en.md new file mode 100644 index 00000000000000..8de2b578bda5d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_financial_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_finetuned_financial_pipeline pipeline BGEEmbeddings from Nishanth7803 +author: John Snow Labs +name: bge_base_finetuned_financial_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned_financial_pipeline` is a English model originally trained by Nishanth7803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_pipeline_en_5.4.0_3.0_1718092823761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_financial_pipeline_en_5.4.0_3.0_1718092823761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_finetuned_financial_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_finetuned_financial_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned_financial_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Nishanth7803/bge-base-finetuned-financial + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..ccc001fb802b37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_finetuned_pipeline pipeline BGEEmbeddings from Suva +author: John Snow Labs +name: bge_base_finetuned_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_finetuned_pipeline` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_pipeline_en_5.4.0_3.0_1718067193949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_finetuned_pipeline_en_5.4.0_3.0_1718067193949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.7 MB| + +## References + +https://huggingface.co/Suva/bge-base-finetuned + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_en.md new file mode 100644 index 00000000000000..45e78b26efba89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v3 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v3 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v3` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_en_5.4.0_3.0_1718068326018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_en_5.4.0_3.0_1718068326018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v3","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v3","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_pipeline_en.md new file mode 100644 index 00000000000000..bacb064ba4a2de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v3_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v3_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_pipeline_en_5.4.0_3.0_1718068367075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v3_pipeline_en_5.4.0_3.0_1718068367075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v3 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_en.md new file mode 100644 index 00000000000000..fed9a54052012f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v5 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v5` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_en_5.4.0_3.0_1718066529630.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_en_5.4.0_3.0_1718066529630.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_pipeline_en.md new file mode 100644 index 00000000000000..11dfb8e72c0160 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_base_securiti_dataset_1_v5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v5_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v5_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_pipeline_en_5.4.0_3.0_1718066569289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v5_pipeline_en_5.4.0_3.0_1718066569289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|376.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_en.md new file mode 100644 index 00000000000000..48bbe26ef4cdfe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_fin_intent_large_chinese_v1_5 BGEEmbeddings from luchun +author: John Snow Labs +name: bge_fin_intent_large_chinese_v1_5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_fin_intent_large_chinese_v1_5` is a English model originally trained by luchun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_en_5.4.0_3.0_1718069318884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_en_5.4.0_3.0_1718069318884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_fin_intent_large_chinese_v1_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_fin_intent_large_chinese_v1_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_fin_intent_large_chinese_v1_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/luchun/bge_fin_intent_large_zh_v1.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_pipeline_en.md new file mode 100644 index 00000000000000..90d1f9253e582b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_fin_intent_large_chinese_v1_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_fin_intent_large_chinese_v1_5_pipeline pipeline BGEEmbeddings from luchun +author: John Snow Labs +name: bge_fin_intent_large_chinese_v1_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_fin_intent_large_chinese_v1_5_pipeline` is a English model originally trained by luchun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_pipeline_en_5.4.0_3.0_1718069406837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_fin_intent_large_chinese_v1_5_pipeline_en_5.4.0_3.0_1718069406837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_fin_intent_large_chinese_v1_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_fin_intent_large_chinese_v1_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_fin_intent_large_chinese_v1_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/luchun/bge_fin_intent_large_zh_v1.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_en.md new file mode 100644 index 00000000000000..53dded9bc32fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_chinese_v2_2 BGEEmbeddings from clinno +author: John Snow Labs +name: bge_large_chinese_v2_2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_chinese_v2_2` is a English model originally trained by clinno. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_en_5.4.0_3.0_1718068973981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_en_5.4.0_3.0_1718068973981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_chinese_v2_2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_chinese_v2_2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_chinese_v2_2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/clinno/bge-large-zh-v2.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_pipeline_en.md new file mode 100644 index 00000000000000..851f0c9cc64cc9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_chinese_v2_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_chinese_v2_2_pipeline pipeline BGEEmbeddings from clinno +author: John Snow Labs +name: bge_large_chinese_v2_2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_chinese_v2_2_pipeline` is a English model originally trained by clinno. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_pipeline_en_5.4.0_3.0_1718069063205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_chinese_v2_2_pipeline_en_5.4.0_3.0_1718069063205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_chinese_v2_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_chinese_v2_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_chinese_v2_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/clinno/bge-large-zh-v2.2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_en.md new file mode 100644 index 00000000000000..8244f4edb1f34d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_english_v1_5_finetuned_300 BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_large_english_v1_5_finetuned_300 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_finetuned_300` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_en_5.4.0_3.0_1718070925146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_en_5.4.0_3.0_1718070925146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_finetuned_300","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_finetuned_300","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_finetuned_300| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ramnathv/bge-large-en-v1.5-finetuned-300 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_pipeline_en.md new file mode 100644 index 00000000000000..ca76448b27a082 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_finetuned_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_english_v1_5_finetuned_300_pipeline pipeline BGEEmbeddings from ramnathv +author: John Snow Labs +name: bge_large_english_v1_5_finetuned_300_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_finetuned_300_pipeline` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718071024233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_finetuned_300_pipeline_en_5.4.0_3.0_1718071024233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_english_v1_5_finetuned_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_english_v1_5_finetuned_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_finetuned_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ramnathv/bge-large-en-v1.5-finetuned-300 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_en.md new file mode 100644 index 00000000000000..ea6669643dd4a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_english_v1_5_isoko_27001 BGEEmbeddings from Basti8499 +author: John Snow Labs +name: bge_large_english_v1_5_isoko_27001 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_isoko_27001` is a English model originally trained by Basti8499. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_en_5.4.0_3.0_1718067937932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_en_5.4.0_3.0_1718067937932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_isoko_27001","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_english_v1_5_isoko_27001","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_isoko_27001| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Basti8499/bge-large-en-v1.5-ISO-27001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_pipeline_en.md new file mode 100644 index 00000000000000..42c2fe245f18c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_english_v1_5_isoko_27001_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_english_v1_5_isoko_27001_pipeline pipeline BGEEmbeddings from Basti8499 +author: John Snow Labs +name: bge_large_english_v1_5_isoko_27001_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_english_v1_5_isoko_27001_pipeline` is a English model originally trained by Basti8499. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_pipeline_en_5.4.0_3.0_1718068026491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_english_v1_5_isoko_27001_pipeline_en_5.4.0_3.0_1718068026491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_english_v1_5_isoko_27001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_english_v1_5_isoko_27001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_english_v1_5_isoko_27001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Basti8499/bge-large-en-v1.5-ISO-27001 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_en.md new file mode 100644 index 00000000000000..4b3fd17641b451 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_fine_tuned_paraphrase BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned_paraphrase +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned_paraphrase` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_en_5.4.0_3.0_1718065837377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_en_5.4.0_3.0_1718065837377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned_paraphrase","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_fine_tuned_paraphrase","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned_paraphrase| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned-paraphrase \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_pipeline_en.md new file mode 100644 index 00000000000000..1fbe6a70fdf418 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_fine_tuned_paraphrase_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_fine_tuned_paraphrase_pipeline pipeline BGEEmbeddings from kwang123 +author: John Snow Labs +name: bge_large_fine_tuned_paraphrase_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_fine_tuned_paraphrase_pipeline` is a English model originally trained by kwang123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_pipeline_en_5.4.0_3.0_1718065932307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_fine_tuned_paraphrase_pipeline_en_5.4.0_3.0_1718065932307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_fine_tuned_paraphrase_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_fine_tuned_paraphrase_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_fine_tuned_paraphrase_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/kwang123/bge-large-fine-tuned-paraphrase + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_en.md new file mode 100644 index 00000000000000..3b989d26e56de8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_finetuned BGEEmbeddings from Suva +author: John Snow Labs +name: bge_large_finetuned +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_finetuned` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_en_5.4.0_3.0_1718067513153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_en_5.4.0_3.0_1718067513153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_finetuned","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_finetuned","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_finetuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Suva/bge-large-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..83fe488d9f9947 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_finetuned_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_finetuned_pipeline pipeline BGEEmbeddings from Suva +author: John Snow Labs +name: bge_large_finetuned_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_finetuned_pipeline` is a English model originally trained by Suva. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_pipeline_en_5.4.0_3.0_1718067606500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_finetuned_pipeline_en_5.4.0_3.0_1718067606500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Suva/bge-large-finetuned + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_en.md new file mode 100644 index 00000000000000..b42137c6096a8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_frombge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_frombge_en_5.4.0_3.0_1718067524307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_frombge_en_5.4.0_3.0_1718067524307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge-large-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_pipeline_en.md new file mode 100644 index 00000000000000..39e67c73b8b3bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: bge_large_frombge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_frombge_pipeline_en_5.4.0_3.0_1718067620106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_frombge_pipeline_en_5.4.0_3.0_1718067620106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/bge-large-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_en.md new file mode 100644 index 00000000000000..6c668cb96d7758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_medical BGEEmbeddings from ls-da3m0ns +author: John Snow Labs +name: bge_large_medical +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_medical` is a English model originally trained by ls-da3m0ns. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_medical_en_5.4.0_3.0_1718068972650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_medical_en_5.4.0_3.0_1718068972650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_medical","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_medical","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_medical| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ls-da3m0ns/bge_large_medical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_pipeline_en.md new file mode 100644 index 00000000000000..472b989cc427e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_medical_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_medical_pipeline pipeline BGEEmbeddings from ls-da3m0ns +author: John Snow Labs +name: bge_large_medical_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_medical_pipeline` is a English model originally trained by ls-da3m0ns. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_medical_pipeline_en_5.4.0_3.0_1718069069828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_medical_pipeline_en_5.4.0_3.0_1718069069828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_medical_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_medical_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_medical_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/ls-da3m0ns/bge_large_medical + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_en.md new file mode 100644 index 00000000000000..b00c834ef09006 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_large_v1_5_fine_tuning BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_large_v1_5_fine_tuning +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_v1_5_fine_tuning` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_en_5.4.0_3.0_1718070424136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_en_5.4.0_3.0_1718070424136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_large_v1_5_fine_tuning","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_large_v1_5_fine_tuning","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_v1_5_fine_tuning| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/bespin-global/bge-large-v1.5-fine-tuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_pipeline_en.md new file mode 100644 index 00000000000000..d4b058517c2ba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_large_v1_5_fine_tuning_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_large_v1_5_fine_tuning_pipeline pipeline BGEEmbeddings from bespin-global +author: John Snow Labs +name: bge_large_v1_5_fine_tuning_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_large_v1_5_fine_tuning_pipeline` is a English model originally trained by bespin-global. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718070509670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_large_v1_5_fine_tuning_pipeline_en_5.4.0_3.0_1718070509670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_large_v1_5_fine_tuning_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_large_v1_5_fine_tuning_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_large_v1_5_fine_tuning_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/bespin-global/bge-large-v1.5-fine-tuning + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_en.md new file mode 100644 index 00000000000000..20510e703e76ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_micro_v2_taylorai BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro_v2_taylorai +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_taylorai` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_en_5.4.0_3.0_1718066828545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_en_5.4.0_3.0_1718066828545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_micro_v2_taylorai","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_micro_v2_taylorai","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_taylorai| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_pipeline_en.md new file mode 100644 index 00000000000000..b60ef1ab07770e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_micro_v2_taylorai_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_micro_v2_taylorai_pipeline pipeline BGEEmbeddings from TaylorAI +author: John Snow Labs +name: bge_micro_v2_taylorai_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_micro_v2_taylorai_pipeline` is a English model originally trained by TaylorAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_pipeline_en_5.4.0_3.0_1718066843713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_micro_v2_taylorai_pipeline_en_5.4.0_3.0_1718066843713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_micro_v2_taylorai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_micro_v2_taylorai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_micro_v2_taylorai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.5 MB| + +## References + +https://huggingface.co/TaylorAI/bge-micro-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_en.md new file mode 100644 index 00000000000000..91abe6d7db1944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_book_qa BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_book_qa +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_book_qa` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_en_5.4.0_3.0_1718067110698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_en_5.4.0_3.0_1718067110698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_book_qa","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_book_qa","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_book_qa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|77.9 MB| + +## References + +https://huggingface.co/svjack/bge-small-book-qa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_pipeline_en.md new file mode 100644 index 00000000000000..3e07c2195c6844 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_book_qa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_book_qa_pipeline pipeline BGEEmbeddings from svjack +author: John Snow Labs +name: bge_small_book_qa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_book_qa_pipeline` is a English model originally trained by svjack. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_pipeline_en_5.4.0_3.0_1718067120445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_book_qa_pipeline_en_5.4.0_3.0_1718067120445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_book_qa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_book_qa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_book_qa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|77.9 MB| + +## References + +https://huggingface.co/svjack/bge-small-book-qa + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_en.md new file mode 100644 index 00000000000000..c888ae34eb3bce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_dmr BGEEmbeddings from McGill-NLP +author: John Snow Labs +name: bge_small_dmr +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_dmr` is a English model originally trained by McGill-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_dmr_en_5.4.0_3.0_1718068959912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_dmr_en_5.4.0_3.0_1718068959912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_dmr","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_dmr","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_dmr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|118.8 MB| + +## References + +https://huggingface.co/McGill-NLP/bge-small-dmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_pipeline_en.md new file mode 100644 index 00000000000000..af0d4cc93cd003 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_dmr_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_dmr_pipeline pipeline BGEEmbeddings from McGill-NLP +author: John Snow Labs +name: bge_small_dmr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_dmr_pipeline` is a English model originally trained by McGill-NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_dmr_pipeline_en_5.4.0_3.0_1718068970032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_dmr_pipeline_en_5.4.0_3.0_1718068970032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_dmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_dmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_dmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|118.8 MB| + +## References + +https://huggingface.co/McGill-NLP/bge-small-dmr + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_en.md new file mode 100644 index 00000000000000..019ad695fe0539 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english_v1_5_fine_tuned_v0 BGEEmbeddings from RMWeerasinghe +author: John Snow Labs +name: bge_small_english_v1_5_fine_tuned_v0 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_fine_tuned_v0` is a English model originally trained by RMWeerasinghe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_en_5.4.0_3.0_1718070869960.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_en_5.4.0_3.0_1718070869960.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_fine_tuned_v0","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_fine_tuned_v0","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_fine_tuned_v0| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/RMWeerasinghe/bge-small-en-v1.5-fine-tuned-v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_pipeline_en.md new file mode 100644 index 00000000000000..9d43bac7187b85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_fine_tuned_v0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_v1_5_fine_tuned_v0_pipeline pipeline BGEEmbeddings from RMWeerasinghe +author: John Snow Labs +name: bge_small_english_v1_5_fine_tuned_v0_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_fine_tuned_v0_pipeline` is a English model originally trained by RMWeerasinghe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_pipeline_en_5.4.0_3.0_1718070881475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_fine_tuned_v0_pipeline_en_5.4.0_3.0_1718070881475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_v1_5_fine_tuned_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_v1_5_fine_tuned_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_fine_tuned_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.4 MB| + +## References + +https://huggingface.co/RMWeerasinghe/bge-small-en-v1.5-fine-tuned-v0 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_en.md new file mode 100644 index 00000000000000..0f08260ae09d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_small_english_v1_5_ft BGEEmbeddings from Rebecca19990101 +author: John Snow Labs +name: bge_small_english_v1_5_ft +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_ft` is a English model originally trained by Rebecca19990101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_en_5.4.0_3.0_1718066588306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_en_5.4.0_3.0_1718066588306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_ft","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_small_english_v1_5_ft","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_ft| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/Rebecca19990101/bge-small-en-v1.5-ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_pipeline_en.md new file mode 100644 index 00000000000000..28164eba7d3997 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-bge_small_english_v1_5_ft_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_small_english_v1_5_ft_pipeline pipeline BGEEmbeddings from Rebecca19990101 +author: John Snow Labs +name: bge_small_english_v1_5_ft_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_small_english_v1_5_ft_pipeline` is a English model originally trained by Rebecca19990101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_pipeline_en_5.4.0_3.0_1718066599896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_small_english_v1_5_ft_pipeline_en_5.4.0_3.0_1718066599896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_small_english_v1_5_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_small_english_v1_5_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_small_english_v1_5_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.6 MB| + +## References + +https://huggingface.co/Rebecca19990101/bge-small-en-v1.5-ft + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_en.md b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_en.md new file mode 100644 index 00000000000000..d2c10e42153585 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_xlmr XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_xlmr +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_xlmr` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_en_5.4.0_3.0_1718133095328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_en_5.4.0_3.0_1718133095328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_xlmr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("cat_ner_xlmr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_xlmr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-xlmr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_pipeline_en.md new file mode 100644 index 00000000000000..385f4e30bd760f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-cat_ner_xlmr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cat_ner_xlmr_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_xlmr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_xlmr_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_pipeline_en_5.4.0_3.0_1718133250662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_xlmr_pipeline_en_5.4.0_3.0_1718133250662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cat_ner_xlmr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cat_ner_xlmr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_xlmr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-xlmr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_en.md new file mode 100644 index 00000000000000..b5d0c1a46f3feb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clinico_xlm_roberta XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_en_5.4.0_3.0_1718128896490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_en_5.4.0_3.0_1718128896490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|809.2 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_en.md new file mode 100644 index 00000000000000..743dfadef873e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clinico_xlm_roberta_large_finetuned_augmented1 XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta_large_finetuned_augmented1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta_large_finetuned_augmented1` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_en_5.4.0_3.0_1718116724883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_en_5.4.0_3.0_1718116724883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta_large_finetuned_augmented1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("clinico_xlm_roberta_large_finetuned_augmented1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta_large_finetuned_augmented1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|990.7 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta-large-finetuned-augmented1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en.md new file mode 100644 index 00000000000000..ab91d907073976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clinico_xlm_roberta_large_finetuned_augmented1_pipeline pipeline XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta_large_finetuned_augmented1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta_large_finetuned_augmented1_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en_5.4.0_3.0_1718116804876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_large_finetuned_augmented1_pipeline_en_5.4.0_3.0_1718116804876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinico_xlm_roberta_large_finetuned_augmented1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinico_xlm_roberta_large_finetuned_augmented1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta_large_finetuned_augmented1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|990.7 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta-large-finetuned-augmented1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_pipeline_en.md new file mode 100644 index 00000000000000..6b368408600786 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-clinico_xlm_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clinico_xlm_roberta_pipeline pipeline XlmRoBertaForTokenClassification from joheras +author: John Snow Labs +name: clinico_xlm_roberta_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinico_xlm_roberta_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_pipeline_en_5.4.0_3.0_1718129063045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinico_xlm_roberta_pipeline_en_5.4.0_3.0_1718129063045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinico_xlm_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinico_xlm_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinico_xlm_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|809.2 MB| + +## References + +https://huggingface.co/joheras/clinico-xlm-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_en.md b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_en.md new file mode 100644 index 00000000000000..7a26233be7f622 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English embed_bge_base_edu BGEEmbeddings from HelixAI +author: John Snow Labs +name: embed_bge_base_edu +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`embed_bge_base_edu` is a English model originally trained by HelixAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_en_5.4.0_3.0_1718064591261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_en_5.4.0_3.0_1718064591261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("embed_bge_base_edu","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("embed_bge_base_edu","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|embed_bge_base_edu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/HelixAI/embed_bge_base_edu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_pipeline_en.md new file mode 100644 index 00000000000000..28d954abd52b07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-embed_bge_base_edu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English embed_bge_base_edu_pipeline pipeline BGEEmbeddings from HelixAI +author: John Snow Labs +name: embed_bge_base_edu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`embed_bge_base_edu_pipeline` is a English model originally trained by HelixAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718064624942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/embed_bge_base_edu_pipeline_en_5.4.0_3.0_1718064624942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("embed_bge_base_edu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|embed_bge_base_edu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|384.4 MB| + +## References + +https://huggingface.co/HelixAI/embed_bge_base_edu + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_en.md b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_en.md new file mode 100644 index 00000000000000..cec67577fae4b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English enlm_roberta_conll2003_final_stemmed XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_conll2003_final_stemmed +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_conll2003_final_stemmed` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_en_5.4.0_3.0_1718130791037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_en_5.4.0_3.0_1718130791037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("enlm_roberta_conll2003_final_stemmed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("enlm_roberta_conll2003_final_stemmed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_conll2003_final_stemmed| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|464.4 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-conll2003-final-stemmed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_pipeline_en.md new file mode 100644 index 00000000000000..1ab1ad8976349f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-enlm_roberta_conll2003_final_stemmed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English enlm_roberta_conll2003_final_stemmed_pipeline pipeline XlmRoBertaForTokenClassification from manirai91 +author: John Snow Labs +name: enlm_roberta_conll2003_final_stemmed_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`enlm_roberta_conll2003_final_stemmed_pipeline` is a English model originally trained by manirai91. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_pipeline_en_5.4.0_3.0_1718130820497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/enlm_roberta_conll2003_final_stemmed_pipeline_en_5.4.0_3.0_1718130820497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("enlm_roberta_conll2003_final_stemmed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("enlm_roberta_conll2003_final_stemmed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|enlm_roberta_conll2003_final_stemmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.4 MB| + +## References + +https://huggingface.co/manirai91/enlm-roberta-conll2003-final-stemmed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_en.md new file mode 100644 index 00000000000000..a40343ecc6ea23 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetune_bge_small_english BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_en_5.4.0_3.0_1718070674961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_en_5.4.0_3.0_1718070674961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|111.4 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_pipeline_en.md new file mode 100644 index 00000000000000..e50735f7c776a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetune_bge_small_english_pipeline pipeline BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english_pipeline` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_pipeline_en_5.4.0_3.0_1718070687403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_pipeline_en_5.4.0_3.0_1718070687403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_bge_small_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_bge_small_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.4 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_en.md new file mode 100644 index 00000000000000..2abfc6f2f68179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetune_bge_small_english_v2 BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english_v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english_v2` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_en_5.4.0_3.0_1718068366186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_en_5.4.0_3.0_1718068366186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetune_bge_small_english_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|111.6 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en-v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_pipeline_en.md new file mode 100644 index 00000000000000..c8bcfa07388df2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetune_bge_small_english_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetune_bge_small_english_v2_pipeline pipeline BGEEmbeddings from srmishra +author: John Snow Labs +name: finetune_bge_small_english_v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_bge_small_english_v2_pipeline` is a English model originally trained by srmishra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_pipeline_en_5.4.0_3.0_1718068378155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_bge_small_english_v2_pipeline_en_5.4.0_3.0_1718068378155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_bge_small_english_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_bge_small_english_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_bge_small_english_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.6 MB| + +## References + +https://huggingface.co/srmishra/finetune-bge-small-en-v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_en.md new file mode 100644 index 00000000000000..3439602576ebea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English finetuned_bge_embeddings_v2 BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings_v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings_v2` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_en_5.4.0_3.0_1718068767959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_en_5.4.0_3.0_1718068767959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings_v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("finetuned_bge_embeddings_v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings_v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.3 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned_bge_embeddings_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_pipeline_en.md new file mode 100644 index 00000000000000..e05a358c20f686 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-finetuned_bge_embeddings_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English finetuned_bge_embeddings_v2_pipeline pipeline BGEEmbeddings from austinpatrickm +author: John Snow Labs +name: finetuned_bge_embeddings_v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bge_embeddings_v2_pipeline` is a English model originally trained by austinpatrickm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_pipeline_en_5.4.0_3.0_1718068801953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bge_embeddings_v2_pipeline_en_5.4.0_3.0_1718068801953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bge_embeddings_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bge_embeddings_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bge_embeddings_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.3 MB| + +## References + +https://huggingface.co/austinpatrickm/finetuned_bge_embeddings_v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_en.md new file mode 100644 index 00000000000000..4eade3f65eb966 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English flipped_2e_4_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: flipped_2e_4_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flipped_2e_4_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_en_5.4.0_3.0_1718135046057.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_en_5.4.0_3.0_1718135046057.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("flipped_2e_4_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("flipped_2e_4_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flipped_2e_4_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/flipped_2e-4_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_pipeline_en.md new file mode 100644 index 00000000000000..469450664c9c63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-flipped_2e_4_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English flipped_2e_4_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: flipped_2e_4_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`flipped_2e_4_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_pipeline_en_5.4.0_3.0_1718135123982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/flipped_2e_4_hausa_pipeline_en_5.4.0_3.0_1718135123982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("flipped_2e_4_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("flipped_2e_4_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|flipped_2e_4_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/flipped_2e-4_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_en.md new file mode 100644 index 00000000000000..6838ad98a55e03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_14000 pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_14000 +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_14000` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_en_5.4.0_3.0_1718070029524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_en_5.4.0_3.0_1718070029524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_14000| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_14000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en.md new file mode 100644 index 00000000000000..db4921b31a6ec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en_5.4.0_3.0_1718070131741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline_en_5.4.0_3.0_1718070131741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_14000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_14000 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_en.md new file mode 100644 index 00000000000000..9f460e5389976c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_1400 pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_1400 +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_1400` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_en_5.4.0_3.0_1718070029081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_en_5.4.0_3.0_1718070029081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_1400| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_1400 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en.md new file mode 100644 index 00000000000000..8c4da8abd50941 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718070130971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718070130971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_baai_bge_large_english_1400_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_BAAI-bge-large-en_1400 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_en.md new file mode 100644 index 00000000000000..a98067398edf6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_en.md @@ -0,0 +1,66 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_bge_large_english_1400 pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_bge_large_english_1400 +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_bge_large_english_1400` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_en_5.4.0_3.0_1718064838241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_en_5.4.0_3.0_1718064838241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_bge_large_english_1400| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_bge-large-en_1400 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en.md new file mode 100644 index 00000000000000..181a06f1c99df7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English frpile_gpl_test_pipeline_bge_large_english_1400_pipeline pipeline BGEEmbeddings from DragosGorduza +author: John Snow Labs +name: frpile_gpl_test_pipeline_bge_large_english_1400_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frpile_gpl_test_pipeline_bge_large_english_1400_pipeline` is a English model originally trained by DragosGorduza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718064864537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frpile_gpl_test_pipeline_bge_large_english_1400_pipeline_en_5.4.0_3.0_1718064864537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frpile_gpl_test_pipeline_bge_large_english_1400_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frpile_gpl_test_pipeline_bge_large_english_1400_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/DragosGorduza/FRPile_GPL_test_pipeline_bge-large-en_1400 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_gl.md new file mode 100644 index 00000000000000..13c83ee82a7c63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician gal_ensp_xlm_r XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_ensp_xlm_r +date: 2024-06-11 +tags: [gl, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ensp_xlm_r` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_gl_5.4.0_3.0_1718137952557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_gl_5.4.0_3.0_1718137952557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ensp_xlm_r","gl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_ensp_xlm_r", "gl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ensp_xlm_r| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|gl| +|Size:|875.6 MB| + +## References + +https://huggingface.co/mbruton/gal_ensp_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_pipeline_gl.md new file mode 100644 index 00000000000000..71b08a0753314d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_ensp_xlm_r_pipeline_gl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Galician gal_ensp_xlm_r_pipeline pipeline XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_ensp_xlm_r_pipeline +date: 2024-06-11 +tags: [gl, open_source, pipeline, onnx] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_ensp_xlm_r_pipeline` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_pipeline_gl_5.4.0_3.0_1718138026562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_ensp_xlm_r_pipeline_gl_5.4.0_3.0_1718138026562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_ensp_xlm_r_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_ensp_xlm_r_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_ensp_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|875.6 MB| + +## References + +https://huggingface.co/mbruton/gal_ensp_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_en.md b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_en.md new file mode 100644 index 00000000000000..f3a33b1d5ed94b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English gal_sayula_popoluca_iw_catalan_galician XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_iw_catalan_galician +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_iw_catalan_galician` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_en_5.4.0_3.0_1718135375655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_en_5.4.0_3.0_1718135375655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_catalan_galician","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_sayula_popoluca_iw_catalan_galician", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_iw_catalan_galician| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|424.0 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-iw-ca-gl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_pipeline_en.md new file mode 100644 index 00000000000000..ce62633ad8bacd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_sayula_popoluca_iw_catalan_galician_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English gal_sayula_popoluca_iw_catalan_galician_pipeline pipeline XlmRoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: gal_sayula_popoluca_iw_catalan_galician_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_sayula_popoluca_iw_catalan_galician_pipeline` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_pipeline_en_5.4.0_3.0_1718135417434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_sayula_popoluca_iw_catalan_galician_pipeline_en_5.4.0_3.0_1718135417434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_sayula_popoluca_iw_catalan_galician_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_sayula_popoluca_iw_catalan_galician_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_sayula_popoluca_iw_catalan_galician_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|424.0 MB| + +## References + +https://huggingface.co/homersimpson/gal-pos-iw-ca-gl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_gl.md new file mode 100644 index 00000000000000..669f074a9a94fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_gl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Galician gal_xlm_r XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_xlm_r +date: 2024-06-11 +tags: [gl, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_xlm_r` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_xlm_r_gl_5.4.0_3.0_1718129399804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_xlm_r_gl_5.4.0_3.0_1718129399804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_xlm_r","gl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("gal_xlm_r", "gl") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_xlm_r| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|gl| +|Size:|811.1 MB| + +## References + +https://huggingface.co/mbruton/gal_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_pipeline_gl.md b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_pipeline_gl.md new file mode 100644 index 00000000000000..2d851f15540bdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-gal_xlm_r_pipeline_gl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Galician gal_xlm_r_pipeline pipeline XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: gal_xlm_r_pipeline +date: 2024-06-11 +tags: [gl, open_source, pipeline, onnx] +task: Named Entity Recognition +language: gl +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gal_xlm_r_pipeline` is a Galician model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gal_xlm_r_pipeline_gl_5.4.0_3.0_1718129547679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gal_xlm_r_pipeline_gl_5.4.0_3.0_1718129547679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gal_xlm_r_pipeline", lang = "gl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gal_xlm_r_pipeline", lang = "gl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gal_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gl| +|Size:|811.1 MB| + +## References + +https://huggingface.co/mbruton/gal_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_en.md b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_en.md new file mode 100644 index 00000000000000..5223fe9b18eee2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English innox_roberta_xlm XlmRoBertaForTokenClassification from brao +author: John Snow Labs +name: innox_roberta_xlm +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`innox_roberta_xlm` is a English model originally trained by brao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_en_5.4.0_3.0_1718113890181.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_en_5.4.0_3.0_1718113890181.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("innox_roberta_xlm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("innox_roberta_xlm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|innox_roberta_xlm| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|844.8 MB| + +## References + +https://huggingface.co/brao/innox-roberta-xlm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_pipeline_en.md new file mode 100644 index 00000000000000..569c837e768789 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-innox_roberta_xlm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English innox_roberta_xlm_pipeline pipeline XlmRoBertaForTokenClassification from brao +author: John Snow Labs +name: innox_roberta_xlm_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`innox_roberta_xlm_pipeline` is a English model originally trained by brao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_pipeline_en_5.4.0_3.0_1718113971358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/innox_roberta_xlm_pipeline_en_5.4.0_3.0_1718113971358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("innox_roberta_xlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("innox_roberta_xlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|innox_roberta_xlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|844.9 MB| + +## References + +https://huggingface.co/brao/innox-roberta-xlm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_km.md b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_km.md new file mode 100644 index 00000000000000..c829ec7de4738b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_km.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Central Khmer, Khmer khmer_sayula_popoluca_roberta XlmRoBertaForTokenClassification from seanghay +author: John Snow Labs +name: khmer_sayula_popoluca_roberta +date: 2024-06-11 +tags: [km, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: km +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khmer_sayula_popoluca_roberta` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_km_5.4.0_3.0_1718101584580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_km_5.4.0_3.0_1718101584580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("khmer_sayula_popoluca_roberta","km") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("khmer_sayula_popoluca_roberta", "km") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khmer_sayula_popoluca_roberta| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|km| +|Size:|834.1 MB| + +## References + +https://huggingface.co/seanghay/khmer-pos-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_pipeline_km.md b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_pipeline_km.md new file mode 100644 index 00000000000000..43068fb297c137 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-khmer_sayula_popoluca_roberta_pipeline_km.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Central Khmer, Khmer khmer_sayula_popoluca_roberta_pipeline pipeline XlmRoBertaForTokenClassification from seanghay +author: John Snow Labs +name: khmer_sayula_popoluca_roberta_pipeline +date: 2024-06-11 +tags: [km, open_source, pipeline, onnx] +task: Named Entity Recognition +language: km +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khmer_sayula_popoluca_roberta_pipeline` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_pipeline_km_5.4.0_3.0_1718101675243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khmer_sayula_popoluca_roberta_pipeline_km_5.4.0_3.0_1718101675243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khmer_sayula_popoluca_roberta_pipeline", lang = "km") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khmer_sayula_popoluca_roberta_pipeline", lang = "km") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khmer_sayula_popoluca_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|km| +|Size:|834.1 MB| + +## References + +https://huggingface.co/seanghay/khmer-pos-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_en.md b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_en.md new file mode 100644 index 00000000000000..2fd99bf5101158 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English large_finetuned_frombge BGEEmbeddings from joshus +author: John Snow Labs +name: large_finetuned_frombge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`large_finetuned_frombge` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_en_5.4.0_3.0_1718065476775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_en_5.4.0_3.0_1718065476775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("large_finetuned_frombge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("large_finetuned_frombge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|large_finetuned_frombge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/large-finetuned-frombge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_pipeline_en.md new file mode 100644 index 00000000000000..49411c809d1c89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-large_finetuned_frombge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English large_finetuned_frombge_pipeline pipeline BGEEmbeddings from joshus +author: John Snow Labs +name: large_finetuned_frombge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`large_finetuned_frombge_pipeline` is a English model originally trained by joshus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065569338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/large_finetuned_frombge_pipeline_en_5.4.0_3.0_1718065569338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("large_finetuned_frombge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("large_finetuned_frombge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|large_finetuned_frombge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/joshus/large-finetuned-frombge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_mn.md b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_mn.md new file mode 100644 index 00000000000000..cbf49865cad2cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_mn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Mongolian mongolian_davlan_xlm_roberta_base_ner_hrl XlmRoBertaForTokenClassification from Blgn94 +author: John Snow Labs +name: mongolian_davlan_xlm_roberta_base_ner_hrl +date: 2024-06-11 +tags: [mn, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_davlan_xlm_roberta_base_ner_hrl` is a Mongolian model originally trained by Blgn94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_mn_5.4.0_3.0_1718117634412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_mn_5.4.0_3.0_1718117634412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_davlan_xlm_roberta_base_ner_hrl","mn") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("mongolian_davlan_xlm_roberta_base_ner_hrl", "mn") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_davlan_xlm_roberta_base_ner_hrl| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|mn| +|Size:|911.8 MB| + +## References + +https://huggingface.co/Blgn94/mongolian-davlan-xlm-roberta-base-ner-hrl \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn.md b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn.md new file mode 100644 index 00000000000000..a827050beccd15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Mongolian mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline pipeline XlmRoBertaForTokenClassification from Blgn94 +author: John Snow Labs +name: mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline +date: 2024-06-11 +tags: [mn, open_source, pipeline, onnx] +task: Named Entity Recognition +language: mn +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline` is a Mongolian model originally trained by Blgn94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn_5.4.0_3.0_1718117724257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline_mn_5.4.0_3.0_1718117724257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline", lang = "mn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline", lang = "mn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mongolian_davlan_xlm_roberta_base_ner_hrl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mn| +|Size:|911.8 MB| + +## References + +https://huggingface.co/Blgn94/mongolian-davlan-xlm-roberta-base-ner-hrl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_hi.md b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_hi.md new file mode 100644 index 00000000000000..d612a61d9ba4d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi monolingual_hindi_ner_model XlmRoBertaForTokenClassification from Sankalp-Bahad +author: John Snow Labs +name: monolingual_hindi_ner_model +date: 2024-06-11 +tags: [hi, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`monolingual_hindi_ner_model` is a Hindi model originally trained by Sankalp-Bahad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_hi_5.4.0_3.0_1718097802374.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_hi_5.4.0_3.0_1718097802374.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("monolingual_hindi_ner_model","hi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("monolingual_hindi_ner_model", "hi") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|monolingual_hindi_ner_model| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|hi| +|Size:|777.8 MB| + +## References + +https://huggingface.co/Sankalp-Bahad/Monolingual-Hindi-NER-Model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_pipeline_hi.md new file mode 100644 index 00000000000000..16aa982adfba3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-monolingual_hindi_ner_model_pipeline_hi.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hindi monolingual_hindi_ner_model_pipeline pipeline XlmRoBertaForTokenClassification from Sankalp-Bahad +author: John Snow Labs +name: monolingual_hindi_ner_model_pipeline +date: 2024-06-11 +tags: [hi, open_source, pipeline, onnx] +task: Named Entity Recognition +language: hi +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`monolingual_hindi_ner_model_pipeline` is a Hindi model originally trained by Sankalp-Bahad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_pipeline_hi_5.4.0_3.0_1718097981576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/monolingual_hindi_ner_model_pipeline_hi_5.4.0_3.0_1718097981576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("monolingual_hindi_ner_model_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("monolingual_hindi_ner_model_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|monolingual_hindi_ner_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|777.8 MB| + +## References + +https://huggingface.co/Sankalp-Bahad/Monolingual-Hindi-NER-Model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_en.md new file mode 100644 index 00000000000000..161e51533e8c5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_en_5.4.0_3.0_1718066567656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_en_5.4.0_3.0_1718066567656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_large_english_v1_5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en.md new file mode 100644 index 00000000000000..6b8fb98cdec4f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_large_english_v1_5_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_large_english_v1_5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_large_english_v1_5_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en_5.4.0_3.0_1718066659282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_large_english_v1_5_pipeline_en_5.4.0_3.0_1718066659282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_large_english_v1_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_large_english_v1_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_large_en_v1.5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_en.md new file mode 100644 index 00000000000000..8eb54a118813ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5v2 BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5v2 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5v2` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_en_5.4.0_3.0_1718068196639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_en_5.4.0_3.0_1718068196639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5v2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("mps_invoice_product_baai_bge_small_english_v1_5v2","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5v2| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|110.8 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en.md new file mode 100644 index 00000000000000..64d6cc513ea316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline pipeline BGEEmbeddings from vincentpremise +author: John Snow Labs +name: mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline` is a English model originally trained by vincentpremise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en_5.4.0_3.0_1718068209256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline_en_5.4.0_3.0_1718068209256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mps_invoice_product_baai_bge_small_english_v1_5v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|110.8 MB| + +## References + +https://huggingface.co/vincentpremise/mps-invoice-product-BAAI_bge_small_en_v1.5v2 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_en.md b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_en.md new file mode 100644 index 00000000000000..e274706d317b27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_indian_xlm_roberta XlmRoBertaForTokenClassification from Venkatesh4342 +author: John Snow Labs +name: ner_indian_xlm_roberta +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_indian_xlm_roberta` is a English model originally trained by Venkatesh4342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_en_5.4.0_3.0_1718097570966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_en_5.4.0_3.0_1718097570966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_indian_xlm_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ner_indian_xlm_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_indian_xlm_roberta| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|882.7 MB| + +## References + +https://huggingface.co/Venkatesh4342/NER-Indian-xlm-roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_pipeline_en.md new file mode 100644 index 00000000000000..d008e302c3fc22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ner_indian_xlm_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_indian_xlm_roberta_pipeline pipeline XlmRoBertaForTokenClassification from Venkatesh4342 +author: John Snow Labs +name: ner_indian_xlm_roberta_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_indian_xlm_roberta_pipeline` is a English model originally trained by Venkatesh4342. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_pipeline_en_5.4.0_3.0_1718097649711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_indian_xlm_roberta_pipeline_en_5.4.0_3.0_1718097649711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_indian_xlm_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_indian_xlm_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_indian_xlm_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|882.7 MB| + +## References + +https://huggingface.co/Venkatesh4342/NER-Indian-xlm-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_en.md b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_en.md new file mode 100644 index 00000000000000..b502b81f00abef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English nicher_embedder_bge BGEEmbeddings from nicher92 +author: John Snow Labs +name: nicher_embedder_bge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nicher_embedder_bge` is a English model originally trained by nicher92. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_en_5.4.0_3.0_1718070249595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_en_5.4.0_3.0_1718070249595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("nicher_embedder_bge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("nicher_embedder_bge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nicher_embedder_bge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/nicher92/nicher-embedder-bge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_pipeline_en.md new file mode 100644 index 00000000000000..31d0df35424c94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-nicher_embedder_bge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English nicher_embedder_bge_pipeline pipeline BGEEmbeddings from nicher92 +author: John Snow Labs +name: nicher_embedder_bge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nicher_embedder_bge_pipeline` is a English model originally trained by nicher92. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_pipeline_en_5.4.0_3.0_1718070334483.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nicher_embedder_bge_pipeline_en_5.4.0_3.0_1718070334483.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nicher_embedder_bge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nicher_embedder_bge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nicher_embedder_bge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.4 GB| + +## References + +https://huggingface.co/nicher92/nicher-embedder-bge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_en.md new file mode 100644 index 00000000000000..8885a768eb247b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English norwegian_delete_5e_5_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_delete_5e_5_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_delete_5e_5_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_en_5.4.0_3.0_1718134235499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_en_5.4.0_3.0_1718134235499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_delete_5e_5_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("norwegian_delete_5e_5_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_delete_5e_5_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no-delete_5e-5_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_pipeline_en.md new file mode 100644 index 00000000000000..1eeab0e59f33ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-norwegian_delete_5e_5_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English norwegian_delete_5e_5_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_delete_5e_5_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_delete_5e_5_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134300332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_delete_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134300332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_delete_5e_5_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_delete_5e_5_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_delete_5e_5_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no-delete_5e-5_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_en.md new file mode 100644 index 00000000000000..79fca25a6b9919 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2et_f5 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f5 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f5` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_en_5.4.0_3.0_1718064224905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_en_5.4.0_3.0_1718064224905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f5","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f5","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f5| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_pipeline_en.md new file mode 100644 index 00000000000000..6a904d3cab16d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f5_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2et_f5_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f5_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f5_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_pipeline_en_5.4.0_3.0_1718064302729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f5_pipeline_en_5.4.0_3.0_1718064302729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2et_f5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2et_f5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f5 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_en.md new file mode 100644 index 00000000000000..c7ecb3852fc74e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_2et_f8 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f8 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f8` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_en_5.4.0_3.0_1718070917192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_en_5.4.0_3.0_1718070917192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f8","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_2et_f8","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f8| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_pipeline_en.md new file mode 100644 index 00000000000000..9f17c2c909b6bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2et_f8_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f8_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f8_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_pipeline_en_5.4.0_3.0_1718070995255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f8_pipeline_en_5.4.0_3.0_1718070995255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2et_f8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2et_f8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f8 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f_again_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f_again_pipeline_en.md new file mode 100644 index 00000000000000..95b9a505d36c86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_2et_f_again_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_2et_f_again_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_2et_f_again_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_2et_f_again_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f_again_pipeline_en_5.4.0_3.0_1718068000718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_2et_f_again_pipeline_en_5.4.0_3.0_1718068000718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_2et_f_again_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_2et_f_again_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_2et_f_again_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-2et-f-again + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_en.md new file mode 100644 index 00000000000000..614bb9a833c3c3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_6e_10f_fp16 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_10f_fp16 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_10f_fp16` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_en_5.4.0_3.0_1718064279193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_en_5.4.0_3.0_1718064279193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_6e_10f_fp16","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_6e_10f_fp16","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_10f_fp16| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-10f-fp16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_pipeline_en.md new file mode 100644 index 00000000000000..21982c70f6013f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_10f_fp16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_6e_10f_fp16_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_10f_fp16_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_10f_fp16_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_pipeline_en_5.4.0_3.0_1718064357206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_10f_fp16_pipeline_en_5.4.0_3.0_1718064357206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_6e_10f_fp16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_6e_10f_fp16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_10f_fp16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-10f-fp16 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_en.md new file mode 100644 index 00000000000000..0f5735b2171fdb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_bge_6e_f10 BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_f10 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_f10` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_en_5.4.0_3.0_1718069903534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_en_5.4.0_3.0_1718069903534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_bge_6e_f10","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_bge_6e_f10","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_f10| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-f10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_pipeline_en.md new file mode 100644 index 00000000000000..9aca0e506bcdf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_bge_6e_f10_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_bge_6e_f10_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_bge_6e_f10_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_bge_6e_f10_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_pipeline_en_5.4.0_3.0_1718069982119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_bge_6e_f10_pipeline_en_5.4.0_3.0_1718069982119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_bge_6e_f10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_bge_6e_f10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_bge_6e_f10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-bge-6e-f10 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_en.md new file mode 100644 index 00000000000000..7a9a63f843ee8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_embed_bge_test BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_embed_bge_test +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_embed_bge_test` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_en_5.4.0_3.0_1718070083201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_en_5.4.0_3.0_1718070083201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_embed_bge_test","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_embed_bge_test","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_embed_bge_test| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/dbourget/philai-embed-bge-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_pipeline_en.md new file mode 100644 index 00000000000000..4195deefe9d758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_embed_bge_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_embed_bge_test_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_embed_bge_test_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_embed_bge_test_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_pipeline_en_5.4.0_3.0_1718070162532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_embed_bge_test_pipeline_en_5.4.0_3.0_1718070162532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_embed_bge_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_embed_bge_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_embed_bge_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/dbourget/philai-embed-bge-test + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_en.md new file mode 100644 index 00000000000000..fb82e4d9976556 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English philai_tsdae_6e_bge_ft_5e BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_tsdae_6e_bge_ft_5e +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_tsdae_6e_bge_ft_5e` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_en_5.4.0_3.0_1718071269830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_en_5.4.0_3.0_1718071269830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("philai_tsdae_6e_bge_ft_5e","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("philai_tsdae_6e_bge_ft_5e","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_tsdae_6e_bge_ft_5e| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-tsdae-6e-bge-ft-5e \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_pipeline_en.md new file mode 100644 index 00000000000000..98b943b691213f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-philai_tsdae_6e_bge_ft_5e_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English philai_tsdae_6e_bge_ft_5e_pipeline pipeline BGEEmbeddings from dbourget +author: John Snow Labs +name: philai_tsdae_6e_bge_ft_5e_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`philai_tsdae_6e_bge_ft_5e_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_pipeline_en_5.4.0_3.0_1718071347273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/philai_tsdae_6e_bge_ft_5e_pipeline_en_5.4.0_3.0_1718071347273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("philai_tsdae_6e_bge_ft_5e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("philai_tsdae_6e_bge_ft_5e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|philai_tsdae_6e_bge_ft_5e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/philai-tsdae-6e-bge-ft-5e + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_en.md new file mode 100644 index 00000000000000..92af3edb17afd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English pmc_bge_1600 BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_1600 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_1600` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_en_5.4.0_3.0_1718066360640.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_en_5.4.0_3.0_1718066360640.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("pmc_bge_1600","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("pmc_bge_1600","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_1600| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_1600 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_pipeline_en.md new file mode 100644 index 00000000000000..461e0f694cc5aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_1600_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pmc_bge_1600_pipeline pipeline BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_1600_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_1600_pipeline` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_pipeline_en_5.4.0_3.0_1718066441204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_1600_pipeline_en_5.4.0_3.0_1718066441204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pmc_bge_1600_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pmc_bge_1600_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_1600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_1600 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_en.md new file mode 100644 index 00000000000000..d241d711b21112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English pmc_bge_800 BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_800 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_800` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_800_en_5.4.0_3.0_1718067490438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_800_en_5.4.0_3.0_1718067490438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("pmc_bge_800","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("pmc_bge_800","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_800| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_pipeline_en.md new file mode 100644 index 00000000000000..66a2b9f3bc11d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-pmc_bge_800_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English pmc_bge_800_pipeline pipeline BGEEmbeddings from Labib11 +author: John Snow Labs +name: pmc_bge_800_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pmc_bge_800_pipeline` is a English model originally trained by Labib11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pmc_bge_800_pipeline_en_5.4.0_3.0_1718067571477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pmc_bge_800_pipeline_en_5.4.0_3.0_1718067571477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pmc_bge_800_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pmc_bge_800_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pmc_bge_800_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/Labib11/PMC_bge_800 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_en.md b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_en.md new file mode 100644 index 00000000000000..b77ddfbb9b8ef4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_ner_aimlab XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: roberta_base_ner_aimlab +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_aimlab` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_en_5.4.0_3.0_1718109052399.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_en_5.4.0_3.0_1718109052399.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("roberta_base_ner_aimlab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("roberta_base_ner_aimlab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_aimlab| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|661.0 MB| + +## References + +https://huggingface.co/Aimlab/Roberta-Base-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_pipeline_en.md new file mode 100644 index 00000000000000..aed36a6cdf8eea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-roberta_base_ner_aimlab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_ner_aimlab_pipeline pipeline XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: roberta_base_ner_aimlab_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ner_aimlab_pipeline` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_pipeline_en_5.4.0_3.0_1718109288508.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ner_aimlab_pipeline_en_5.4.0_3.0_1718109288508.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_ner_aimlab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_ner_aimlab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ner_aimlab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|661.0 MB| + +## References + +https://huggingface.co/Aimlab/Roberta-Base-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_es.md b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_es.md new file mode 100644 index 00000000000000..d922ab33c0951a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish spa_enpt_xlm_r XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: spa_enpt_xlm_r +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spa_enpt_xlm_r` is a Castilian, Spanish model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_es_5.4.0_3.0_1718131430548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_es_5.4.0_3.0_1718131430548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("spa_enpt_xlm_r","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("spa_enpt_xlm_r", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spa_enpt_xlm_r| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|877.9 MB| + +## References + +https://huggingface.co/mbruton/spa_enpt_XLM-R \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_pipeline_es.md new file mode 100644 index 00000000000000..80e95e0c3ba61e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-spa_enpt_xlm_r_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish spa_enpt_xlm_r_pipeline pipeline XlmRoBertaForTokenClassification from mbruton +author: John Snow Labs +name: spa_enpt_xlm_r_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spa_enpt_xlm_r_pipeline` is a Castilian, Spanish model originally trained by mbruton. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_pipeline_es_5.4.0_3.0_1718131504737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spa_enpt_xlm_r_pipeline_es_5.4.0_3.0_1718131504737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spa_enpt_xlm_r_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spa_enpt_xlm_r_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spa_enpt_xlm_r_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|877.9 MB| + +## References + +https://huggingface.co/mbruton/spa_enpt_XLM-R + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-squirtle_en.md b/docs/_posts/ahmedlone127/2024-06-11-squirtle_en.md new file mode 100644 index 00000000000000..f916472a53856a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-squirtle_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English squirtle BGEEmbeddings from Mihaiii +author: John Snow Labs +name: squirtle +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squirtle` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squirtle_en_5.4.0_3.0_1718068737550.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squirtle_en_5.4.0_3.0_1718068737550.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("squirtle","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("squirtle","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squirtle| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|56.9 MB| + +## References + +https://huggingface.co/Mihaiii/Squirtle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-squirtle_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-squirtle_pipeline_en.md new file mode 100644 index 00000000000000..d2c34144840e04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-squirtle_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squirtle_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: squirtle_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squirtle_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718068741228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squirtle_pipeline_en_5.4.0_3.0_1718068741228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squirtle_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squirtle_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squirtle_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|56.9 MB| + +## References + +https://huggingface.co/Mihaiii/Squirtle + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_en.md new file mode 100644 index 00000000000000..7a4f59bf50d1f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ter_class_5e_5_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: ter_class_5e_5_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ter_class_5e_5_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_en_5.4.0_3.0_1718134138177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_en_5.4.0_3.0_1718134138177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ter_class_5e_5_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("ter_class_5e_5_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ter_class_5e_5_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/ter_class_5e-5_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_pipeline_en.md new file mode 100644 index 00000000000000..a01b8cb2c993d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-ter_class_5e_5_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ter_class_5e_5_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: ter_class_5e_5_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ter_class_5e_5_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134205150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ter_class_5e_5_hausa_pipeline_en_5.4.0_3.0_1718134205150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ter_class_5e_5_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ter_class_5e_5_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ter_class_5e_5_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/ter_class_5e-5_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-test25_en.md b/docs/_posts/ahmedlone127/2024-06-11-test25_en.md new file mode 100644 index 00000000000000..c5ba5eeab3785e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-test25_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English test25 BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25 +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718066814999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_en_5.4.0_3.0_1718066814999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("test25","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("test25","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-test25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-test25_pipeline_en.md new file mode 100644 index 00000000000000..002ac821579257 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-test25_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test25_pipeline pipeline BGEEmbeddings from Mihaiii +author: John Snow Labs +name: test25_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test25_pipeline` is a English model originally trained by Mihaiii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718066819304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test25_pipeline_en_5.4.0_3.0_1718066819304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|64.2 MB| + +## References + +https://huggingface.co/Mihaiii/test25 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-testbge_en.md b/docs/_posts/ahmedlone127/2024-06-11-testbge_en.md new file mode 100644 index 00000000000000..7b45a4ccb30976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-testbge_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English testbge BGEEmbeddings from Neokun004 +author: John Snow Labs +name: testbge +date: 2024-06-11 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testbge` is a English model originally trained by Neokun004. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718068731591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testbge_en_5.4.0_3.0_1718068731591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("testbge","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("testbge","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testbge| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/Neokun004/Testbge \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-testbge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-testbge_pipeline_en.md new file mode 100644 index 00000000000000..2b0d2e7c410cf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-testbge_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English testbge_pipeline pipeline BGEEmbeddings from Neokun004 +author: John Snow Labs +name: testbge_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testbge_pipeline` is a English model originally trained by Neokun004. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testbge_pipeline_en_5.4.0_3.0_1718068743671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testbge_pipeline_en_5.4.0_3.0_1718068743671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testbge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testbge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testbge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|112.3 MB| + +## References + +https://huggingface.co/Neokun004/Testbge + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_pipeline_zh.md new file mode 100644 index 00000000000000..1b6090b5995b50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_pipeline_zh.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Chinese text2vec_bge_large_chinese_pipeline pipeline BGEEmbeddings from shibing624 +author: John Snow Labs +name: text2vec_bge_large_chinese_pipeline +date: 2024-06-11 +tags: [zh, open_source, pipeline, onnx] +task: Embeddings +language: zh +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text2vec_bge_large_chinese_pipeline` is a Chinese model originally trained by shibing624. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_pipeline_zh_5.4.0_3.0_1718064982625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_pipeline_zh_5.4.0_3.0_1718064982625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("text2vec_bge_large_chinese_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("text2vec_bge_large_chinese_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text2vec_bge_large_chinese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|1.2 GB| + +## References + +https://huggingface.co/shibing624/text2vec-bge-large-chinese + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_zh.md b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_zh.md new file mode 100644 index 00000000000000..1401c5576630d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-text2vec_bge_large_chinese_zh.md @@ -0,0 +1,87 @@ +--- +layout: model +title: Chinese text2vec_bge_large_chinese BGEEmbeddings from shibing624 +author: John Snow Labs +name: text2vec_bge_large_chinese +date: 2024-06-11 +tags: [zh, open_source, onnx, embeddings, bge] +task: Embeddings +language: zh +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`text2vec_bge_large_chinese` is a Chinese model originally trained by shibing624. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_zh_5.4.0_3.0_1718064908171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/text2vec_bge_large_chinese_zh_5.4.0_3.0_1718064908171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("text2vec_bge_large_chinese","zh") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("text2vec_bge_large_chinese","zh") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|text2vec_bge_large_chinese| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|zh| +|Size:|1.2 GB| + +## References + +https://huggingface.co/shibing624/text2vec-bge-large-chinese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en.md new file mode 100644 index 00000000000000..627f1a99ef967c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en_5.4.0_3.0_1718116878754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_en_5.4.0_3.0_1718116878754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.4 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-earnings21-normalized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en.md new file mode 100644 index 00000000000000..66721eacebed9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline pipeline XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en_5.4.0_3.0_1718116969169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline_en_5.4.0_3.0_1718116969169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_earnings21_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.4 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-earnings21-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en.md new file mode 100644 index 00000000000000..7a7e479adbd179 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_normalized XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_normalized +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_normalized` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en_5.4.0_3.0_1718132044391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_en_5.4.0_3.0_1718132044391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_normalized","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_switchboard_normalized", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_normalized| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.7 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-normalized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en.md new file mode 100644 index 00000000000000..b4f1e299e74fb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline pipeline XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en_5.4.0_3.0_1718132124356.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline_en_5.4.0_3.0_1718132124356.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_switchboard_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.8 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-switchboard-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_en.md b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_en.md new file mode 100644 index 00000000000000..7dd0fe1ea14ec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English unfiltered_norwegian_delete_hausa XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: unfiltered_norwegian_delete_hausa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unfiltered_norwegian_delete_hausa` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_en_5.4.0_3.0_1718137800494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_en_5.4.0_3.0_1718137800494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("unfiltered_norwegian_delete_hausa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("unfiltered_norwegian_delete_hausa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unfiltered_norwegian_delete_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/unfiltered_no_delete_hausa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_pipeline_en.md new file mode 100644 index 00000000000000..3e581b2104e7d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-unfiltered_norwegian_delete_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English unfiltered_norwegian_delete_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: unfiltered_norwegian_delete_hausa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`unfiltered_norwegian_delete_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_pipeline_en_5.4.0_3.0_1718137873213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/unfiltered_norwegian_delete_hausa_pipeline_en_5.4.0_3.0_1718137873213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("unfiltered_norwegian_delete_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("unfiltered_norwegian_delete_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|unfiltered_norwegian_delete_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/unfiltered_no_delete_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_en.md new file mode 100644 index 00000000000000..00419890b231cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_norwegian_i XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_norwegian_i +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_norwegian_i` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_en_5.4.0_3.0_1718125739303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_en_5.4.0_3.0_1718125739303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_norwegian_i","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_norwegian_i", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_norwegian_i| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-no-I \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_pipeline_en.md new file mode 100644 index 00000000000000..45414c70584b55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_norwegian_i_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_norwegian_i_pipeline pipeline XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_norwegian_i_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_norwegian_i_pipeline` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_pipeline_en_5.4.0_3.0_1718125912980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_norwegian_i_pipeline_en_5.4.0_3.0_1718125912980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_norwegian_i_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_norwegian_i_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_norwegian_i_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-no-I + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_es.md new file mode 100644 index 00000000000000..7ae38f4c8ffd2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_distemist XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_distemist +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_distemist` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_es_5.4.0_3.0_1718123981684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_es_5.4.0_3.0_1718123981684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_distemist","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_distemist", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_distemist| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-distemist \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_pipeline_es.md new file mode 100644 index 00000000000000..538c3a69c4367b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_distemist_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_distemist_pipeline pipeline XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_distemist_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_distemist_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_pipeline_es_5.4.0_3.0_1718124047204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_distemist_pipeline_es_5.4.0_3.0_1718124047204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_distemist_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_distemist_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_distemist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-distemist + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_es.md new file mode 100644 index 00000000000000..4046a53a1db1b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner1 XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner1 +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner1` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_es_5.4.0_3.0_1718115800669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_es_5.4.0_3.0_1718115800669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_livingner1","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_livingner1", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_pipeline_es.md new file mode 100644 index 00000000000000..9f117e7066b01c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_livingner1_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_livingner1_pipeline pipeline XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_livingner1_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_livingner1_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_pipeline_es_5.4.0_3.0_1718115866772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_livingner1_pipeline_es_5.4.0_3.0_1718115866772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_livingner1_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_livingner1_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_livingner1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-livingner1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_es.md new file mode 100644 index 00000000000000..7722ed651dbf8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_socialdisner XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_socialdisner +date: 2024-06-11 +tags: [es, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_socialdisner` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_es_5.4.0_3.0_1718131094159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_es_5.4.0_3.0_1718131094159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_socialdisner","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_r_galen_socialdisner", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_socialdisner| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-socialdisner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_pipeline_es.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_pipeline_es.md new file mode 100644 index 00000000000000..e7642db99b0f09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_r_galen_socialdisner_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish xlm_r_galen_socialdisner_pipeline pipeline XlmRoBertaForTokenClassification from IIC +author: John Snow Labs +name: xlm_r_galen_socialdisner_pipeline +date: 2024-06-11 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_r_galen_socialdisner_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_pipeline_es_5.4.0_3.0_1718131160107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_r_galen_socialdisner_pipeline_es_5.4.0_3.0_1718131160107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_r_galen_socialdisner_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_r_galen_socialdisner_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_r_galen_socialdisner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/IIC/XLM_R_Galen-socialdisner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_en.md new file mode 100644 index 00000000000000..5cdb772aecf4c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_char_shopsign XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_roberta_base_char_shopsign +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_char_shopsign` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_en_5.4.0_3.0_1718107970920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_en_5.4.0_3.0_1718107970920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_char_shopsign","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_char_shopsign", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_char_shopsign| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|791.5 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-roberta-base-char-shopsign \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_pipeline_en.md new file mode 100644 index 00000000000000..3c7de744f5d2b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_char_shopsign_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_char_shopsign_pipeline pipeline XlmRoBertaForTokenClassification from HyungYoun +author: John Snow Labs +name: xlm_roberta_base_char_shopsign_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_char_shopsign_pipeline` is a English model originally trained by HyungYoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_pipeline_en_5.4.0_3.0_1718108153585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_char_shopsign_pipeline_en_5.4.0_3.0_1718108153585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_char_shopsign_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_char_shopsign_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_char_shopsign_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|791.5 MB| + +## References + +https://huggingface.co/HyungYoun/xlm-roberta-base-char-shopsign + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_en.md new file mode 100644 index 00000000000000..1b1f7ca7222786 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_germeval_14_4_labels XlmRoBertaForTokenClassification from stefanieZ +author: John Snow Labs +name: xlm_roberta_base_finetuned_germeval_14_4_labels +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_germeval_14_4_labels` is a English model originally trained by stefanieZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_en_5.4.0_3.0_1718098637257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_en_5.4.0_3.0_1718098637257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_germeval_14_4_labels","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_germeval_14_4_labels", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_germeval_14_4_labels| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.2 MB| + +## References + +https://huggingface.co/stefanieZ/xlm-roberta-base-finetuned-germeval-14-4-labels \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en.md new file mode 100644 index 00000000000000..f5a31d0eeb2a86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline pipeline XlmRoBertaForTokenClassification from stefanieZ +author: John Snow Labs +name: xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline` is a English model originally trained by stefanieZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en_5.4.0_3.0_1718098725921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline_en_5.4.0_3.0_1718098725921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_germeval_14_4_labels_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.2 MB| + +## References + +https://huggingface.co/stefanieZ/xlm-roberta-base-finetuned-germeval-14-4-labels + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_en.md new file mode 100644 index 00000000000000..3844f5afe769db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_aiventurer XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_aiventurer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_aiventurer` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_en_5.4.0_3.0_1718125283570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_en_5.4.0_3.0_1718125283570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_aiventurer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_aiventurer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_aiventurer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en.md new file mode 100644 index 00000000000000..56967708485852 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline pipeline XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en_5.4.0_3.0_1718125381551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline_en_5.4.0_3.0_1718125381551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_aiventurer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_en.md new file mode 100644 index 00000000000000..46274025db88f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_alkampfer XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_alkampfer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_alkampfer` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_en_5.4.0_3.0_1718116175555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_en_5.4.0_3.0_1718116175555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_alkampfer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_alkampfer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_alkampfer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en.md new file mode 100644 index 00000000000000..0303e00498c028 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline pipeline XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en_5.4.0_3.0_1718116261198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline_en_5.4.0_3.0_1718116261198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_alkampfer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_en.md new file mode 100644 index 00000000000000..a6e1c91b6b94fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ankit15nov XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ankit15nov +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ankit15nov` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_en_5.4.0_3.0_1718108947475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_en_5.4.0_3.0_1718108947475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ankit15nov","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ankit15nov", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ankit15nov| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en.md new file mode 100644 index 00000000000000..f432269e9e6353 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline pipeline XlmRoBertaForTokenClassification from Ankit15nov +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline` is a English model originally trained by Ankit15nov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en_5.4.0_3.0_1718109030207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline_en_5.4.0_3.0_1718109030207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ankit15nov_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Ankit15nov/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_en.md new file mode 100644 index 00000000000000..86f2fe49d26f0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chaoli XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chaoli +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chaoli` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_en_5.4.0_3.0_1718098637821.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_en_5.4.0_3.0_1718098637821.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chaoli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chaoli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chaoli| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en.md new file mode 100644 index 00000000000000..52866d837414ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chaoli_pipeline pipeline XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chaoli_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chaoli_pipeline` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en_5.4.0_3.0_1718098729223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chaoli_pipeline_en_5.4.0_3.0_1718098729223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chaoli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chaoli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chaoli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_en.md new file mode 100644 index 00000000000000..64fc34d4e85285 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_en_5.4.0_3.0_1718118485792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_en_5.4.0_3.0_1718118485792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..aed01a62dd47e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en_5.4.0_3.0_1718118601294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline_en_5.4.0_3.0_1718118601294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|848.0 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_en.md new file mode 100644 index 00000000000000..556280e1ae0191 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ericklerouge123 XlmRoBertaForTokenClassification from ericklerouge123 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ericklerouge123 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ericklerouge123` is a English model originally trained by ericklerouge123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_en_5.4.0_3.0_1718108261451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_en_5.4.0_3.0_1718108261451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ericklerouge123","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ericklerouge123", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ericklerouge123| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ericklerouge123/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en.md new file mode 100644 index 00000000000000..8b1c2246620583 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline pipeline XlmRoBertaForTokenClassification from ericklerouge123 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline` is a English model originally trained by ericklerouge123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en_5.4.0_3.0_1718108344098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline_en_5.4.0_3.0_1718108344098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ericklerouge123_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ericklerouge123/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_en.md new file mode 100644 index 00000000000000..d3f6a85e9ffab2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_handun XlmRoBertaForTokenClassification from Handun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_handun +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_handun` is a English model originally trained by Handun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_en_5.4.0_3.0_1718118887243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_en_5.4.0_3.0_1718118887243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_handun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_handun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_handun| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Handun/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_pipeline_en.md new file mode 100644 index 00000000000000..b6e3cc15539aa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_handun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_handun_pipeline pipeline XlmRoBertaForTokenClassification from Handun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_handun_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_handun_pipeline` is a English model originally trained by Handun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_pipeline_en_5.4.0_3.0_1718118969768.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_handun_pipeline_en_5.4.0_3.0_1718118969768.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_handun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_handun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_handun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Handun/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_en.md new file mode 100644 index 00000000000000..d02cb71c1daff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_en_5.4.0_3.0_1718110991637.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_en_5.4.0_3.0_1718110991637.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.6 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..a94f99e2f230e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en_5.4.0_3.0_1718111085484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline_en_5.4.0_3.0_1718111085484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.6 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_en.md new file mode 100644 index 00000000000000..036b9187f80a75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jbreunig XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jbreunig +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jbreunig` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_en_5.4.0_3.0_1718107874411.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_en_5.4.0_3.0_1718107874411.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jbreunig","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_jbreunig", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jbreunig| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en.md new file mode 100644 index 00000000000000..5bcdf6b1791a41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline pipeline XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en_5.4.0_3.0_1718107964056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline_en_5.4.0_3.0_1718107964056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_jbreunig_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_en.md new file mode 100644 index 00000000000000..66f8046465c485 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_kenhoffman +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_en_5.4.0_3.0_1718127818537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_en_5.4.0_3.0_1718127818537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_kenhoffman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en.md new file mode 100644 index 00000000000000..5d804192119f93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline pipeline XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en_5.4.0_3.0_1718127905178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline_en_5.4.0_3.0_1718127905178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_kenhoffman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|859.5 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_en.md new file mode 100644 index 00000000000000..523b7b3310d18a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_nobody138 XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_nobody138 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_nobody138` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_en_5.4.0_3.0_1718113345863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_en_5.4.0_3.0_1718113345863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_nobody138","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_nobody138", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_nobody138| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en.md new file mode 100644 index 00000000000000..b904c04a4b07e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_nobody138_pipeline pipeline XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_nobody138_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_nobody138_pipeline` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en_5.4.0_3.0_1718113438712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_nobody138_pipeline_en_5.4.0_3.0_1718113438712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_nobody138_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_nobody138_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_nobody138_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_en.md new file mode 100644 index 00000000000000..d051161b93902f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_en_5.4.0_3.0_1718110030921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_en_5.4.0_3.0_1718110030921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|862.0 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..bf4cd7e01a1d92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en_5.4.0_3.0_1718110125093.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_philosucker_pipeline_en_5.4.0_3.0_1718110125093.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|862.0 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_en.md new file mode 100644 index 00000000000000..b199be992ae80a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_reaverlee XlmRoBertaForTokenClassification from reaverlee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_reaverlee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_reaverlee` is a English model originally trained by reaverlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_en_5.4.0_3.0_1718103752220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_en_5.4.0_3.0_1718103752220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_reaverlee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_reaverlee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_reaverlee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/reaverlee/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en.md new file mode 100644 index 00000000000000..97353356d1528b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline pipeline XlmRoBertaForTokenClassification from reaverlee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline` is a English model originally trained by reaverlee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en_5.4.0_3.0_1718103838430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline_en_5.4.0_3.0_1718103838430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_reaverlee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/reaverlee/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_en.md new file mode 100644 index 00000000000000..060ff871a141fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_skr1125 XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_skr1125 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_skr1125` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_en_5.4.0_3.0_1718108896876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_en_5.4.0_3.0_1718108896876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_skr1125","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_skr1125", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_skr1125| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en.md new file mode 100644 index 00000000000000..706b57f1b6db53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_skr1125_pipeline pipeline XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_skr1125_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_skr1125_pipeline` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en_5.4.0_3.0_1718108978942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_skr1125_pipeline_en_5.4.0_3.0_1718108978942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_skr1125_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_skr1125_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_skr1125_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.1 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_en.md new file mode 100644 index 00000000000000..9720eb3500e060 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_songys XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_songys +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_songys` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_en_5.4.0_3.0_1718119568776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_en_5.4.0_3.0_1718119568776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_songys","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_songys", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_songys| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.8 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_pipeline_en.md new file mode 100644 index 00000000000000..6490785b2de051 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_songys_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_songys_pipeline pipeline XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_songys_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_songys_pipeline` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_pipeline_en_5.4.0_3.0_1718119676969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_songys_pipeline_en_5.4.0_3.0_1718119676969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_songys_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_songys_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_songys_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.8 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_en.md new file mode 100644 index 00000000000000..0d56c5ea6ac6b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_sponomary XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_sponomary +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_sponomary` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_en_5.4.0_3.0_1718107876684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_en_5.4.0_3.0_1718107876684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sponomary","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sponomary", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_sponomary| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en.md new file mode 100644 index 00000000000000..407c8a481c0abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_sponomary_pipeline pipeline XlmRoBertaForTokenClassification from sponomary +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_sponomary_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_sponomary_pipeline` is a English model originally trained by sponomary. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en_5.4.0_3.0_1718107965097.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sponomary_pipeline_en_5.4.0_3.0_1718107965097.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_sponomary_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_sponomary_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_sponomary_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/sponomary/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_en.md new file mode 100644 index 00000000000000..179408e31044ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_takehirako XlmRoBertaForTokenClassification from TakeHirako +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_takehirako +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_takehirako` is a English model originally trained by TakeHirako. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_en_5.4.0_3.0_1718113812321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_en_5.4.0_3.0_1718113812321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_takehirako","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_takehirako", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_takehirako| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/TakeHirako/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en.md new file mode 100644 index 00000000000000..e79784f7645c21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_takehirako_pipeline pipeline XlmRoBertaForTokenClassification from TakeHirako +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_takehirako_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_takehirako_pipeline` is a English model originally trained by TakeHirako. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en_5.4.0_3.0_1718113894990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_takehirako_pipeline_en_5.4.0_3.0_1718113894990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_takehirako_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_takehirako_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_takehirako_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/TakeHirako/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_en.md new file mode 100644 index 00000000000000..43de1129ad50c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_team_nave XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_team_nave +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_team_nave` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_en_5.4.0_3.0_1718120120396.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_en_5.4.0_3.0_1718120120396.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_team_nave","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_team_nave", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_team_nave| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en.md new file mode 100644 index 00000000000000..97d772bf409807 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_team_nave_pipeline pipeline XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_team_nave_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_team_nave_pipeline` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en_5.4.0_3.0_1718120206717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_team_nave_pipeline_en_5.4.0_3.0_1718120206717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_team_nave_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_team_nave_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_team_nave_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_en.md new file mode 100644 index 00000000000000..2a4024f3b62295 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_yasu320001 XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_yasu320001 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_yasu320001` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_en_5.4.0_3.0_1718110226831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_en_5.4.0_3.0_1718110226831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_yasu320001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_yasu320001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_yasu320001| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en.md new file mode 100644 index 00000000000000..db39c0d29d11da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline pipeline XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en_5.4.0_3.0_1718110311879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline_en_5.4.0_3.0_1718110311879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_yasu320001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_en.md new file mode 100644 index 00000000000000..016e17113ba51a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_yazannasser XlmRoBertaForTokenClassification from Yazannasser +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_yazannasser +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_yazannasser` is a English model originally trained by Yazannasser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_en_5.4.0_3.0_1718117424894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_en_5.4.0_3.0_1718117424894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_yazannasser","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_yazannasser", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_yazannasser| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.6 MB| + +## References + +https://huggingface.co/Yazannasser/xlm-roberta-base-finetuned-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en.md new file mode 100644 index 00000000000000..0938047b94ab51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline pipeline XlmRoBertaForTokenClassification from Yazannasser +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline` is a English model originally trained by Yazannasser. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en_5.4.0_3.0_1718117513792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline_en_5.4.0_3.0_1718117513792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_yazannasser_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.7 MB| + +## References + +https://huggingface.co/Yazannasser/xlm-roberta-base-finetuned-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_en.md new file mode 100644 index 00000000000000..28e368ceee2b51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaid_33 XlmRoBertaForTokenClassification from zaid-33 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaid_33 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaid_33` is a English model originally trained by zaid-33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_en_5.4.0_3.0_1718123006984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_en_5.4.0_3.0_1718123006984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaid_33","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_arabic_zaid_33", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaid_33| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.6 MB| + +## References + +https://huggingface.co/zaid-33/xlm-roberta-base-finetuned-panx-ar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en.md new file mode 100644 index 00000000000000..96d6651f75f6f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline pipeline XlmRoBertaForTokenClassification from zaid-33 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline` is a English model originally trained by zaid-33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en_5.4.0_3.0_1718123110351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline_en_5.4.0_3.0_1718123110351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_arabic_zaid_33_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.7 MB| + +## References + +https://huggingface.co/zaid-33/xlm-roberta-base-finetuned-panx-ar + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_en.md new file mode 100644 index 00000000000000..4b88be830b1e08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ajit_transformer XlmRoBertaForTokenClassification from ajit-transformer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ajit_transformer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ajit_transformer` is a English model originally trained by ajit-transformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_en_5.4.0_3.0_1718133051088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_en_5.4.0_3.0_1718133051088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ajit_transformer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ajit_transformer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ajit_transformer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/ajit-transformer/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en.md new file mode 100644 index 00000000000000..9bbfce58883ca1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline pipeline XlmRoBertaForTokenClassification from ajit-transformer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline` is a English model originally trained by ajit-transformer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en_5.4.0_3.0_1718133163195.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline_en_5.4.0_3.0_1718133163195.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ajit_transformer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.0 MB| + +## References + +https://huggingface.co/ajit-transformer/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_en.md new file mode 100644 index 00000000000000..c2c2e6544c8261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ashrielbrian XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ashrielbrian +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ashrielbrian` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_en_5.4.0_3.0_1718099679172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_en_5.4.0_3.0_1718099679172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ashrielbrian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ashrielbrian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ashrielbrian| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..9d6b888dd680b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en_5.4.0_3.0_1718099797797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline_en_5.4.0_3.0_1718099797797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_en.md new file mode 100644 index 00000000000000..f8b4984c68295a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_en_5.4.0_3.0_1718123063495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_en_5.4.0_3.0_1718123063495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..8104da9bd18a6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en_5.4.0_3.0_1718123201197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline_en_5.4.0_3.0_1718123201197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_en.md new file mode 100644 index 00000000000000..3fc7e2610a285a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cj_mills XlmRoBertaForTokenClassification from cj-mills +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cj_mills +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cj_mills` is a English model originally trained by cj-mills. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_en_5.4.0_3.0_1718105119471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_en_5.4.0_3.0_1718105119471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cj_mills","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cj_mills", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cj_mills| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|823.0 MB| + +## References + +https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en.md new file mode 100644 index 00000000000000..4ed018d527bc1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline pipeline XlmRoBertaForTokenClassification from cj-mills +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline` is a English model originally trained by cj-mills. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en_5.4.0_3.0_1718105233563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline_en_5.4.0_3.0_1718105233563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cj_mills_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|823.0 MB| + +## References + +https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_en.md new file mode 100644 index 00000000000000..85dfd8735fcb72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cogitur XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cogitur +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cogitur` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_en_5.4.0_3.0_1718113324050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_en_5.4.0_3.0_1718113324050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cogitur","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_cogitur", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cogitur| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en.md new file mode 100644 index 00000000000000..4f8a49db01a021 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_cogitur_pipeline pipeline XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_cogitur_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_cogitur_pipeline` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en_5.4.0_3.0_1718113459211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_cogitur_pipeline_en_5.4.0_3.0_1718113459211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cogitur_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_cogitur_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_cogitur_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_en.md new file mode 100644 index 00000000000000..5cdd165415d655 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_flood XlmRoBertaForTokenClassification from flood +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_flood +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_flood` is a English model originally trained by flood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_en_5.4.0_3.0_1718102977423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_en_5.4.0_3.0_1718102977423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_flood","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_flood", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_flood| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/flood/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_pipeline_en.md new file mode 100644 index 00000000000000..eb008c457d44ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_flood_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_flood_pipeline pipeline XlmRoBertaForTokenClassification from flood +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_flood_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_flood_pipeline` is a English model originally trained by flood. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_pipeline_en_5.4.0_3.0_1718103096292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_flood_pipeline_en_5.4.0_3.0_1718103096292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_flood_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_flood_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_flood_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/flood/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_en.md new file mode 100644 index 00000000000000..0fa4c60c3aa7dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_gogd XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_gogd +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_gogd` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_en_5.4.0_3.0_1718118864659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_en_5.4.0_3.0_1718118864659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_gogd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_gogd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_gogd| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en.md new file mode 100644 index 00000000000000..5e2458264543aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_gogd_pipeline pipeline XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_gogd_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_gogd_pipeline` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en_5.4.0_3.0_1718118983548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_gogd_pipeline_en_5.4.0_3.0_1718118983548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_gogd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_gogd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_gogd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_en.md new file mode 100644 index 00000000000000..27951b2a3b0e32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_guruji108 XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_guruji108 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_guruji108` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_en_5.4.0_3.0_1718104851201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_en_5.4.0_3.0_1718104851201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_guruji108","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_guruji108", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_guruji108| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en.md new file mode 100644 index 00000000000000..3a5c3e51e4bc6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_guruji108_pipeline pipeline XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_guruji108_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_guruji108_pipeline` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en_5.4.0_3.0_1718104971417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_guruji108_pipeline_en_5.4.0_3.0_1718104971417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_guruji108_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_guruji108_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_guruji108_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_en.md new file mode 100644 index 00000000000000..e417733088768a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_inniok XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_inniok +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_inniok` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_en_5.4.0_3.0_1718127165681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_en_5.4.0_3.0_1718127165681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_inniok","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_inniok", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_inniok| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en.md new file mode 100644 index 00000000000000..7ca89ec08af375 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_inniok_pipeline pipeline XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_inniok_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_inniok_pipeline` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en_5.4.0_3.0_1718127298719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_inniok_pipeline_en_5.4.0_3.0_1718127298719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_inniok_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_inniok_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_inniok_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_en.md new file mode 100644 index 00000000000000..4056d3e1efd2bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_jjglilleberg XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_jjglilleberg +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_jjglilleberg` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_en_5.4.0_3.0_1718138915017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_en_5.4.0_3.0_1718138915017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_jjglilleberg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_jjglilleberg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_jjglilleberg| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en.md new file mode 100644 index 00000000000000..36fb4991bb5740 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline pipeline XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en_5.4.0_3.0_1718139048211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline_en_5.4.0_3.0_1718139048211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_jjglilleberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_en.md new file mode 100644 index 00000000000000..6dfadda6624c87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_kiechu XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_kiechu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_kiechu` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_en_5.4.0_3.0_1718127176349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_en_5.4.0_3.0_1718127176349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_kiechu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_kiechu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_kiechu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en.md new file mode 100644 index 00000000000000..e94a21df3c2132 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_kiechu_pipeline pipeline XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_kiechu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_kiechu_pipeline` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en_5.4.0_3.0_1718127311992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_kiechu_pipeline_en_5.4.0_3.0_1718127311992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_kiechu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_kiechu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_kiechu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..5b444caf0e33da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_laurentiustancioiu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en_5.4.0_3.0_1718111459033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_en_5.4.0_3.0_1718111459033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_laurentiustancioiu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en.md new file mode 100644 index 00000000000000..bddcdeb979b6f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline pipeline XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718111585112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718111585112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_laurentiustancioiu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_en.md new file mode 100644 index 00000000000000..793438bc0d02a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_nobody138 XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_nobody138 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_nobody138` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_en_5.4.0_3.0_1718102724242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_en_5.4.0_3.0_1718102724242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_nobody138","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_nobody138", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_nobody138| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en.md new file mode 100644 index 00000000000000..0e639f379aa7f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_nobody138_pipeline pipeline XlmRoBertaForTokenClassification from Nobody138 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_nobody138_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_nobody138_pipeline` is a English model originally trained by Nobody138. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en_5.4.0_3.0_1718102844591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_nobody138_pipeline_en_5.4.0_3.0_1718102844591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_nobody138_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_nobody138_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_nobody138_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/Nobody138/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_en.md new file mode 100644 index 00000000000000..1e0a9c8ebae827 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_obong XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_obong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_obong` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_en_5.4.0_3.0_1718115707496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_en_5.4.0_3.0_1718115707496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_obong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_obong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_obong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_pipeline_en.md new file mode 100644 index 00000000000000..50ffcdcd801898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_obong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_obong_pipeline pipeline XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_obong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_obong_pipeline` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_pipeline_en_5.4.0_3.0_1718115833149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_obong_pipeline_en_5.4.0_3.0_1718115833149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_obong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_obong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_obong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_en.md new file mode 100644 index 00000000000000..ab471ff94f304d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_reinoudbosch XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_reinoudbosch +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_reinoudbosch` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_en_5.4.0_3.0_1718104038490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_en_5.4.0_3.0_1718104038490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_reinoudbosch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_reinoudbosch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_reinoudbosch| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en.md new file mode 100644 index 00000000000000..2ae926cf361a96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline pipeline XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en_5.4.0_3.0_1718104157046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline_en_5.4.0_3.0_1718104157046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_reinoudbosch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_en.md new file mode 100644 index 00000000000000..61939dd6b5ac77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ridealist XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ridealist +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ridealist` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_en_5.4.0_3.0_1718106344672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_en_5.4.0_3.0_1718106344672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ridealist","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ridealist", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ridealist| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en.md new file mode 100644 index 00000000000000..58077eaf0396f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ridealist_pipeline pipeline XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ridealist_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ridealist_pipeline` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en_5.4.0_3.0_1718106483814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ridealist_pipeline_en_5.4.0_3.0_1718106483814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ridealist_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ridealist_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ridealist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_en.md new file mode 100644 index 00000000000000..24da6cbe1f067b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ryatora XlmRoBertaForTokenClassification from ryatora +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ryatora +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ryatora` is a English model originally trained by ryatora. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_en_5.4.0_3.0_1718137859539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_en_5.4.0_3.0_1718137859539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ryatora","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ryatora", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ryatora| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ryatora/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en.md new file mode 100644 index 00000000000000..f8c19a85c74abf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ryatora_pipeline pipeline XlmRoBertaForTokenClassification from ryatora +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ryatora_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ryatora_pipeline` is a English model originally trained by ryatora. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en_5.4.0_3.0_1718137979348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ryatora_pipeline_en_5.4.0_3.0_1718137979348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ryatora_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ryatora_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ryatora_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ryatora/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_en.md new file mode 100644 index 00000000000000..078ca8405a7ec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_shinta0615 XlmRoBertaForTokenClassification from shinta0615 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_shinta0615 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_shinta0615` is a English model originally trained by shinta0615. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_en_5.4.0_3.0_1718127813006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_en_5.4.0_3.0_1718127813006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_shinta0615","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_shinta0615", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_shinta0615| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/shinta0615/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en.md new file mode 100644 index 00000000000000..749edf37e0dd0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline pipeline XlmRoBertaForTokenClassification from shinta0615 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline` is a English model originally trained by shinta0615. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en_5.4.0_3.0_1718127934494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline_en_5.4.0_3.0_1718127934494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_shinta0615_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/shinta0615/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_en.md new file mode 100644 index 00000000000000..6fe4384d8f0ee9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_skr3178 XlmRoBertaForTokenClassification from skr3178 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_skr3178 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_skr3178` is a English model originally trained by skr3178. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_en_5.4.0_3.0_1718098669326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_en_5.4.0_3.0_1718098669326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr3178","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_skr3178", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_skr3178| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/skr3178/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en.md new file mode 100644 index 00000000000000..9b71dfbb185b4d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_skr3178_pipeline pipeline XlmRoBertaForTokenClassification from skr3178 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_skr3178_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_skr3178_pipeline` is a English model originally trained by skr3178. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en_5.4.0_3.0_1718098786935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_skr3178_pipeline_en_5.4.0_3.0_1718098786935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_skr3178_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_skr3178_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_skr3178_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/skr3178/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_en.md new file mode 100644 index 00000000000000..becb2a842cdb46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_smallsuper XlmRoBertaForTokenClassification from smallsuper +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_smallsuper +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_smallsuper` is a English model originally trained by smallsuper. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_en_5.4.0_3.0_1718113334837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_en_5.4.0_3.0_1718113334837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_smallsuper","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_smallsuper", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_smallsuper| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/smallsuper/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en.md new file mode 100644 index 00000000000000..91387aa8274d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline pipeline XlmRoBertaForTokenClassification from smallsuper +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline` is a English model originally trained by smallsuper. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en_5.4.0_3.0_1718113475489.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline_en_5.4.0_3.0_1718113475489.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_smallsuper_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/smallsuper/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_en.md new file mode 100644 index 00000000000000..0f3e0c96dddb91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_songys XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_songys +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_songys` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_en_5.4.0_3.0_1718135267211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_en_5.4.0_3.0_1718135267211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_songys","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_songys", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_songys| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|824.2 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_pipeline_en.md new file mode 100644 index 00000000000000..fe6d85e35f633d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_songys_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_songys_pipeline pipeline XlmRoBertaForTokenClassification from songys +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_songys_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_songys_pipeline` is a English model originally trained by songys. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_pipeline_en_5.4.0_3.0_1718135383697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_songys_pipeline_en_5.4.0_3.0_1718135383697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_songys_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_songys_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_songys_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|824.2 MB| + +## References + +https://huggingface.co/songys/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en.md new file mode 100644 index 00000000000000..6fd193649df9eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungkwangjoong XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungkwangjoong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungkwangjoong` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en_5.4.0_3.0_1718120799741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_en_5.4.0_3.0_1718120799741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungkwangjoong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungkwangjoong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungkwangjoong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en.md new file mode 100644 index 00000000000000..ddaee7957d071e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en_5.4.0_3.0_1718120933255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline_en_5.4.0_3.0_1718120933255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungkwangjoong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.3 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_en.md new file mode 100644 index 00000000000000..e26dcebfd009d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungwoo1 XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungwoo1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungwoo1` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_en_5.4.0_3.0_1718112155652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_en_5.4.0_3.0_1718112155652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungwoo1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_sungwoo1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungwoo1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en.md new file mode 100644 index 00000000000000..736b24ce435dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline pipeline XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en_5.4.0_3.0_1718112278867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline_en_5.4.0_3.0_1718112278867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_sungwoo1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_en.md new file mode 100644 index 00000000000000..02fa018ee8fc35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_tyayoi XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_tyayoi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_tyayoi` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_en_5.4.0_3.0_1718109938193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_en_5.4.0_3.0_1718109938193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_tyayoi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_tyayoi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_tyayoi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..77bba7e682c512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en_5.4.0_3.0_1718110062314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline_en_5.4.0_3.0_1718110062314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_en.md new file mode 100644 index 00000000000000..bca3859ea94d49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_yasu320001 XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_yasu320001 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_yasu320001` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_en_5.4.0_3.0_1718107907522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_en_5.4.0_3.0_1718107907522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_yasu320001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_yasu320001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_yasu320001| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en.md new file mode 100644 index 00000000000000..a72c5f5fc3f2b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline pipeline XlmRoBertaForTokenClassification from yasu320001 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline` is a English model originally trained by yasu320001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en_5.4.0_3.0_1718108035463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline_en_5.4.0_3.0_1718108035463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_yasu320001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/yasu320001/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_en.md new file mode 100644 index 00000000000000..8f495c25c6513d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ysige XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ysige +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ysige` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_en_5.4.0_3.0_1718124064401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_en_5.4.0_3.0_1718124064401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ysige","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_ysige", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ysige| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en.md new file mode 100644 index 00000000000000..82473ee42357dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_ysige_pipeline pipeline XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_ysige_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_ysige_pipeline` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en_5.4.0_3.0_1718124192684.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_ysige_pipeline_en_5.4.0_3.0_1718124192684.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ysige_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_ysige_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_ysige_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_en.md new file mode 100644 index 00000000000000..7c9fad75305f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_aiventurer XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_aiventurer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_aiventurer` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_en_5.4.0_3.0_1718125781327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_en_5.4.0_3.0_1718125781327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_aiventurer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_aiventurer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_aiventurer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en.md new file mode 100644 index 00000000000000..c929f7e20152fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline pipeline XlmRoBertaForTokenClassification from AIventurer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline` is a English model originally trained by AIventurer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en_5.4.0_3.0_1718125887605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline_en_5.4.0_3.0_1718125887605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_aiventurer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/AIventurer/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_en.md new file mode 100644 index 00000000000000..c1ca90c5c21fd1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cataluna84 XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cataluna84 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cataluna84` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_en_5.4.0_3.0_1718104824875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_en_5.4.0_3.0_1718104824875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cataluna84","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cataluna84", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cataluna84| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en.md new file mode 100644 index 00000000000000..3e6e446099294e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline pipeline XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en_5.4.0_3.0_1718104925385.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline_en_5.4.0_3.0_1718104925385.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cataluna84_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_en.md new file mode 100644 index 00000000000000..29240ba54f0d04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_en_5.4.0_3.0_1718120778592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_en_5.4.0_3.0_1718120778592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..0ee4e7f8873470 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en_5.4.0_3.0_1718120894021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline_en_5.4.0_3.0_1718120894021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_en.md new file mode 100644 index 00000000000000..309ed399919038 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyycyy XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyycyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyycyy` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_en_5.4.0_3.0_1718114724641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_en_5.4.0_3.0_1718114724641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyycyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyycyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyycyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en.md new file mode 100644 index 00000000000000..c98a7bd504e0a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline pipeline XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en_5.4.0_3.0_1718114840341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline_en_5.4.0_3.0_1718114840341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyycyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_en.md new file mode 100644 index 00000000000000..e0505054f678fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_edwardjross XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_edwardjross +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_edwardjross` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_en_5.4.0_3.0_1718105828143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_en_5.4.0_3.0_1718105828143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_edwardjross","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_edwardjross", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_edwardjross| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|842.5 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en.md new file mode 100644 index 00000000000000..c904449cba8826 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline pipeline XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en_5.4.0_3.0_1718105928222.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline_en_5.4.0_3.0_1718105928222.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_edwardjross_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|842.6 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_en.md new file mode 100644 index 00000000000000..8142977278b8de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_fraisier XlmRoBertaForTokenClassification from Fraisier +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_fraisier +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_fraisier` is a English model originally trained by Fraisier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_en_5.4.0_3.0_1718133097044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_en_5.4.0_3.0_1718133097044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_fraisier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_fraisier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_fraisier| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Fraisier/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en.md new file mode 100644 index 00000000000000..2df0d9e40cc8be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_fraisier_pipeline pipeline XlmRoBertaForTokenClassification from Fraisier +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_fraisier_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_fraisier_pipeline` is a English model originally trained by Fraisier. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en_5.4.0_3.0_1718133205314.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_fraisier_pipeline_en_5.4.0_3.0_1718133205314.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_fraisier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_fraisier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_fraisier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Fraisier/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_en.md new file mode 100644 index 00000000000000..9be26a2b0bc6c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_gogd XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_gogd +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_gogd` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_en_5.4.0_3.0_1718118536564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_en_5.4.0_3.0_1718118536564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_gogd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_gogd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_gogd| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en.md new file mode 100644 index 00000000000000..32c63cb92ddd14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_gogd_pipeline pipeline XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_gogd_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_gogd_pipeline` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en_5.4.0_3.0_1718118642815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_gogd_pipeline_en_5.4.0_3.0_1718118642815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_gogd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_gogd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_gogd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_en.md new file mode 100644 index 00000000000000..bceb826ed8308f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guroruseru XlmRoBertaForTokenClassification from Guroruseru +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guroruseru +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guroruseru` is a English model originally trained by Guroruseru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_en_5.4.0_3.0_1718106997773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_en_5.4.0_3.0_1718106997773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guroruseru","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guroruseru", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guroruseru| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guroruseru/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en.md new file mode 100644 index 00000000000000..a4668a3a63a69b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline pipeline XlmRoBertaForTokenClassification from Guroruseru +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline` is a English model originally trained by Guroruseru. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en_5.4.0_3.0_1718107099855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline_en_5.4.0_3.0_1718107099855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guroruseru_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guroruseru/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_en.md new file mode 100644 index 00000000000000..dbe92674e65ee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guruji108 XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guruji108 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guruji108` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_en_5.4.0_3.0_1718103664598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_en_5.4.0_3.0_1718103664598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guruji108","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_guruji108", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guruji108| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en.md new file mode 100644 index 00000000000000..bd64a9835a8373 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_guruji108_pipeline pipeline XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_guruji108_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_guruji108_pipeline` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en_5.4.0_3.0_1718103765265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_guruji108_pipeline_en_5.4.0_3.0_1718103765265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guruji108_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_guruji108_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_guruji108_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_en.md new file mode 100644 index 00000000000000..c85c6ba9ff30e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jamie613 XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jamie613 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jamie613` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_en_5.4.0_3.0_1718117445369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_en_5.4.0_3.0_1718117445369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jamie613","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jamie613", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jamie613| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en.md new file mode 100644 index 00000000000000..a492b8b293890b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jamie613_pipeline pipeline XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jamie613_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jamie613_pipeline` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en_5.4.0_3.0_1718117556257.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jamie613_pipeline_en_5.4.0_3.0_1718117556257.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jamie613_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jamie613_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jamie613_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_en.md new file mode 100644 index 00000000000000..02e9767d7786bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jgriffi XlmRoBertaForTokenClassification from jgriffi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jgriffi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jgriffi` is a English model originally trained by jgriffi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_en_5.4.0_3.0_1718100406513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_en_5.4.0_3.0_1718100406513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jgriffi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_jgriffi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jgriffi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.5 MB| + +## References + +https://huggingface.co/jgriffi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en.md new file mode 100644 index 00000000000000..ddf591632136b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline pipeline XlmRoBertaForTokenClassification from jgriffi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline` is a English model originally trained by jgriffi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en_5.4.0_3.0_1718100503871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline_en_5.4.0_3.0_1718100503871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_jgriffi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.5 MB| + +## References + +https://huggingface.co/jgriffi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_en.md new file mode 100644 index 00000000000000..86127aa7075fe4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kenhoffman +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_en_5.4.0_3.0_1718129000647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_en_5.4.0_3.0_1718129000647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kenhoffman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en.md new file mode 100644 index 00000000000000..f3bc82863d855d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline pipeline XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en_5.4.0_3.0_1718129106082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline_en_5.4.0_3.0_1718129106082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kenhoffman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_en.md new file mode 100644 index 00000000000000..ae2d6d4c1c02ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kiechu XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kiechu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kiechu` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_en_5.4.0_3.0_1718112138878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_en_5.4.0_3.0_1718112138878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kiechu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_kiechu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kiechu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en.md new file mode 100644 index 00000000000000..c9cc2ba5784aa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_kiechu_pipeline pipeline XlmRoBertaForTokenClassification from kiechu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_kiechu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_kiechu_pipeline` is a English model originally trained by kiechu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en_5.4.0_3.0_1718112259482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_kiechu_pipeline_en_5.4.0_3.0_1718112259482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kiechu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_kiechu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_kiechu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/kiechu/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_en.md new file mode 100644 index 00000000000000..6366691e04b6a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_koroku XlmRoBertaForTokenClassification from koroku +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_koroku +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_koroku` is a English model originally trained by koroku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_en_5.4.0_3.0_1718126549452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_en_5.4.0_3.0_1718126549452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_koroku","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_koroku", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_koroku| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.3 MB| + +## References + +https://huggingface.co/koroku/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en.md new file mode 100644 index 00000000000000..e002f3a9a3530a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_koroku_pipeline pipeline XlmRoBertaForTokenClassification from koroku +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_koroku_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_koroku_pipeline` is a English model originally trained by koroku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en_5.4.0_3.0_1718126658951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_koroku_pipeline_en_5.4.0_3.0_1718126658951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_koroku_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_koroku_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_koroku_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.3 MB| + +## References + +https://huggingface.co/koroku/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..95764abdcb3597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_laurentiustancioiu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en_5.4.0_3.0_1718107160865.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_en_5.4.0_3.0_1718107160865.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_laurentiustancioiu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en.md new file mode 100644 index 00000000000000..774b5b140a875d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline pipeline XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718107264003.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718107264003.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_laurentiustancioiu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_en.md new file mode 100644 index 00000000000000..42d71a7a1232a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_munsu XlmRoBertaForTokenClassification from MunSu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_munsu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_munsu` is a English model originally trained by MunSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_en_5.4.0_3.0_1718111168323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_en_5.4.0_3.0_1718111168323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_munsu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_munsu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_munsu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/MunSu/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en.md new file mode 100644 index 00000000000000..8eddb68f1432da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_munsu_pipeline pipeline XlmRoBertaForTokenClassification from MunSu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_munsu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_munsu_pipeline` is a English model originally trained by MunSu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en_5.4.0_3.0_1718111251045.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_munsu_pipeline_en_5.4.0_3.0_1718111251045.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_munsu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_munsu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_munsu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/MunSu/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_en.md new file mode 100644 index 00000000000000..addf7d37a2fc79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_nrazavi XlmRoBertaForTokenClassification from nrazavi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_nrazavi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_nrazavi` is a English model originally trained by nrazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_en_5.4.0_3.0_1718100833288.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_en_5.4.0_3.0_1718100833288.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_nrazavi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_nrazavi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_nrazavi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/nrazavi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en.md new file mode 100644 index 00000000000000..d18462c96d4709 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline pipeline XlmRoBertaForTokenClassification from nrazavi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline` is a English model originally trained by nrazavi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en_5.4.0_3.0_1718100941022.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline_en_5.4.0_3.0_1718100941022.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_nrazavi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/nrazavi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_en.md new file mode 100644 index 00000000000000..6f944d301a6b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_paww XlmRoBertaForTokenClassification from paww +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_paww +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_paww` is a English model originally trained by paww. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_en_5.4.0_3.0_1718102554223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_en_5.4.0_3.0_1718102554223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_paww","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_paww", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_paww| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/paww/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_pipeline_en.md new file mode 100644 index 00000000000000..57cd39917d790e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_paww_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_paww_pipeline pipeline XlmRoBertaForTokenClassification from paww +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_paww_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_paww_pipeline` is a English model originally trained by paww. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_pipeline_en_5.4.0_3.0_1718102656316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_paww_pipeline_en_5.4.0_3.0_1718102656316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_paww_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_paww_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_paww_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/paww/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_en.md new file mode 100644 index 00000000000000..a37aafab383dce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_praboda XlmRoBertaForTokenClassification from Praboda +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_praboda +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_praboda` is a English model originally trained by Praboda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_en_5.4.0_3.0_1718106344629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_en_5.4.0_3.0_1718106344629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_praboda","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_praboda", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_praboda| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Praboda/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en.md new file mode 100644 index 00000000000000..28572a0de80c31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_praboda_pipeline pipeline XlmRoBertaForTokenClassification from Praboda +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_praboda_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_praboda_pipeline` is a English model originally trained by Praboda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en_5.4.0_3.0_1718106452247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_praboda_pipeline_en_5.4.0_3.0_1718106452247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_praboda_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_praboda_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_praboda_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Praboda/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_en.md new file mode 100644 index 00000000000000..d844203abaeee8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sbpark XlmRoBertaForTokenClassification from sbpark +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sbpark +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sbpark` is a English model originally trained by sbpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_en_5.4.0_3.0_1718134367294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_en_5.4.0_3.0_1718134367294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sbpark","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sbpark", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sbpark| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/sbpark/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en.md new file mode 100644 index 00000000000000..ad743fe02741ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sbpark_pipeline pipeline XlmRoBertaForTokenClassification from sbpark +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sbpark_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sbpark_pipeline` is a English model originally trained by sbpark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en_5.4.0_3.0_1718134472473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sbpark_pipeline_en_5.4.0_3.0_1718134472473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sbpark_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sbpark_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sbpark_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/sbpark/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_en.md new file mode 100644 index 00000000000000..5d3c54d8093075 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sreek XlmRoBertaForTokenClassification from Sreek +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sreek +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sreek` is a English model originally trained by Sreek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_en_5.4.0_3.0_1718107166207.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_en_5.4.0_3.0_1718107166207.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sreek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_sreek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sreek| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Sreek/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en.md new file mode 100644 index 00000000000000..3380edc4d1a56d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_sreek_pipeline pipeline XlmRoBertaForTokenClassification from Sreek +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_sreek_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_sreek_pipeline` is a English model originally trained by Sreek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en_5.4.0_3.0_1718107269312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_sreek_pipeline_en_5.4.0_3.0_1718107269312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sreek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_sreek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_sreek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/Sreek/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_en.md new file mode 100644 index 00000000000000..582efe8a8f5d29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_team_nave XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_team_nave +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_team_nave` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_en_5.4.0_3.0_1718126531548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_en_5.4.0_3.0_1718126531548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_team_nave","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_team_nave", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_team_nave| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|835.3 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en.md new file mode 100644 index 00000000000000..dc4816266d267e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_team_nave_pipeline pipeline XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_team_nave_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_team_nave_pipeline` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en_5.4.0_3.0_1718126646597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_team_nave_pipeline_en_5.4.0_3.0_1718126646597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_team_nave_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_team_nave_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_team_nave_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.4 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_en.md new file mode 100644 index 00000000000000..a56cf40981663c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_thkkvui XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_thkkvui +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_thkkvui` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_en_5.4.0_3.0_1718111185799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_en_5.4.0_3.0_1718111185799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_thkkvui","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_thkkvui", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_thkkvui| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en.md new file mode 100644 index 00000000000000..bfb1beff0ce502 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline pipeline XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en_5.4.0_3.0_1718111286791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline_en_5.4.0_3.0_1718111286791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_thkkvui_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_en.md new file mode 100644 index 00000000000000..dd6b3833cc70ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_tyayoi XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_tyayoi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_tyayoi` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_en_5.4.0_3.0_1718109927062.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_en_5.4.0_3.0_1718109927062.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_tyayoi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_tyayoi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_tyayoi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..2243b7314d0dd8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en_5.4.0_3.0_1718110029878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline_en_5.4.0_3.0_1718110029878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.9 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_en.md new file mode 100644 index 00000000000000..0dbacbc3aa6562 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_abdus XlmRoBertaForTokenClassification from abdus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_abdus +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_abdus` is a English model originally trained by abdus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_en_5.4.0_3.0_1718116880454.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_en_5.4.0_3.0_1718116880454.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_abdus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_abdus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_abdus| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/abdus/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en.md new file mode 100644 index 00000000000000..b5ed1b1140307b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_abdus_pipeline pipeline XlmRoBertaForTokenClassification from abdus +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_abdus_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_abdus_pipeline` is a English model originally trained by abdus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en_5.4.0_3.0_1718116976533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_abdus_pipeline_en_5.4.0_3.0_1718116976533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_abdus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_abdus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_abdus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/abdus/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_en.md new file mode 100644 index 00000000000000..21d6b4cd0f7eac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_alkampfer XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_alkampfer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_alkampfer` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_en_5.4.0_3.0_1718132111994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_en_5.4.0_3.0_1718132111994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_alkampfer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_alkampfer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_alkampfer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en.md new file mode 100644 index 00000000000000..68f26f89ed8664 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline pipeline XlmRoBertaForTokenClassification from alkampfer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline` is a English model originally trained by alkampfer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en_5.4.0_3.0_1718132198347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline_en_5.4.0_3.0_1718132198347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_alkampfer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/alkampfer/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_en.md new file mode 100644 index 00000000000000..5c2d5b25981281 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_amitjain171980 XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_amitjain171980 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_amitjain171980` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_en_5.4.0_3.0_1718122378652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_en_5.4.0_3.0_1718122378652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_amitjain171980","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_amitjain171980", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_amitjain171980| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en.md new file mode 100644 index 00000000000000..b569e090a91d8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline pipeline XlmRoBertaForTokenClassification from amitjain171980 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline` is a English model originally trained by amitjain171980. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en_5.4.0_3.0_1718122465598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline_en_5.4.0_3.0_1718122465598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_amitjain171980_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/amitjain171980/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_en.md new file mode 100644 index 00000000000000..e0cd670091607d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_anniepyim XlmRoBertaForTokenClassification from anniepyim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_anniepyim +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_anniepyim` is a English model originally trained by anniepyim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_en_5.4.0_3.0_1718124324499.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_en_5.4.0_3.0_1718124324499.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_anniepyim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_anniepyim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_anniepyim| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/anniepyim/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en.md new file mode 100644 index 00000000000000..1653c42e20a3d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline pipeline XlmRoBertaForTokenClassification from anniepyim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline` is a English model originally trained by anniepyim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en_5.4.0_3.0_1718124411811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline_en_5.4.0_3.0_1718124411811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_anniepyim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/anniepyim/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_en.md new file mode 100644 index 00000000000000..fa3ecba5b92cdc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_antoinev17 XlmRoBertaForTokenClassification from antoinev17 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_antoinev17 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_antoinev17` is a English model originally trained by antoinev17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_en_5.4.0_3.0_1718099320343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_en_5.4.0_3.0_1718099320343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_antoinev17","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_antoinev17", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_antoinev17| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/antoinev17/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en.md new file mode 100644 index 00000000000000..a3b31d1bd9f6d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline pipeline XlmRoBertaForTokenClassification from antoinev17 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline` is a English model originally trained by antoinev17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en_5.4.0_3.0_1718099409176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline_en_5.4.0_3.0_1718099409176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_antoinev17_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/antoinev17/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_en.md new file mode 100644 index 00000000000000..5e430799aef66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_arthur_75 XlmRoBertaForTokenClassification from Arthur-75 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_arthur_75 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_arthur_75` is a English model originally trained by Arthur-75. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_en_5.4.0_3.0_1718119701751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_en_5.4.0_3.0_1718119701751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_arthur_75","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_arthur_75", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_arthur_75| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Arthur-75/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en.md new file mode 100644 index 00000000000000..9307b14243316e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline pipeline XlmRoBertaForTokenClassification from Arthur-75 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline` is a English model originally trained by Arthur-75. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en_5.4.0_3.0_1718119815124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline_en_5.4.0_3.0_1718119815124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_arthur_75_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Arthur-75/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_en.md new file mode 100644 index 00000000000000..c34922d693412b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ashrielbrian XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ashrielbrian +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ashrielbrian` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_en_5.4.0_3.0_1718104690549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_en_5.4.0_3.0_1718104690549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ashrielbrian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ashrielbrian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ashrielbrian| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en.md new file mode 100644 index 00000000000000..e9a8ce4bafcef4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline pipeline XlmRoBertaForTokenClassification from ashrielbrian +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline` is a English model originally trained by ashrielbrian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en_5.4.0_3.0_1718104777282.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline_en_5.4.0_3.0_1718104777282.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ashrielbrian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ashrielbrian/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_en.md new file mode 100644 index 00000000000000..c701e397f17608 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_backdrive XlmRoBertaForTokenClassification from Backdrive +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_backdrive +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_backdrive` is a English model originally trained by Backdrive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_en_5.4.0_3.0_1718127006094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_en_5.4.0_3.0_1718127006094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_backdrive","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_backdrive", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_backdrive| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Backdrive/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en.md new file mode 100644 index 00000000000000..bfcc7ae94dd772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_backdrive_pipeline pipeline XlmRoBertaForTokenClassification from Backdrive +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_backdrive_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_backdrive_pipeline` is a English model originally trained by Backdrive. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en_5.4.0_3.0_1718127090486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_backdrive_pipeline_en_5.4.0_3.0_1718127090486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_backdrive_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_backdrive_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_backdrive_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Backdrive/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_en.md new file mode 100644 index 00000000000000..1fb496ba903772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_benjiccee XlmRoBertaForTokenClassification from Benjiccee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_benjiccee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_benjiccee` is a English model originally trained by Benjiccee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_en_5.4.0_3.0_1718104068280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_en_5.4.0_3.0_1718104068280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_benjiccee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_benjiccee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_benjiccee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Benjiccee/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en.md new file mode 100644 index 00000000000000..d7c5b49d576a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline pipeline XlmRoBertaForTokenClassification from Benjiccee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline` is a English model originally trained by Benjiccee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en_5.4.0_3.0_1718104155327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline_en_5.4.0_3.0_1718104155327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_benjiccee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Benjiccee/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_en.md new file mode 100644 index 00000000000000..92e7989cf267bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_chris_choi XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_chris_choi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_chris_choi` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_en_5.4.0_3.0_1718125306122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_en_5.4.0_3.0_1718125306122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_chris_choi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_chris_choi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_chris_choi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en.md new file mode 100644 index 00000000000000..21578ea4300c2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline pipeline XlmRoBertaForTokenClassification from Chris-choi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline` is a English model originally trained by Chris-choi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en_5.4.0_3.0_1718125424663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline_en_5.4.0_3.0_1718125424663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_chris_choi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Chris-choi/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_en.md new file mode 100644 index 00000000000000..ab929ac32c6849 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_coyote78 XlmRoBertaForTokenClassification from coyote78 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_coyote78 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_coyote78` is a English model originally trained by coyote78. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_en_5.4.0_3.0_1718104705066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_en_5.4.0_3.0_1718104705066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_coyote78","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_coyote78", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_coyote78| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/coyote78/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en.md new file mode 100644 index 00000000000000..b8e12563a37258 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_coyote78_pipeline pipeline XlmRoBertaForTokenClassification from coyote78 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_coyote78_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_coyote78_pipeline` is a English model originally trained by coyote78. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en_5.4.0_3.0_1718104800436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_coyote78_pipeline_en_5.4.0_3.0_1718104800436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_coyote78_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_coyote78_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_coyote78_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/coyote78/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_en.md new file mode 100644 index 00000000000000..75a6458375206e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dazzid XlmRoBertaForTokenClassification from Dazzid +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dazzid +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dazzid` is a English model originally trained by Dazzid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_en_5.4.0_3.0_1718099433191.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_en_5.4.0_3.0_1718099433191.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dazzid","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dazzid", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dazzid| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Dazzid/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en.md new file mode 100644 index 00000000000000..162743d16f4265 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dazzid_pipeline pipeline XlmRoBertaForTokenClassification from Dazzid +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dazzid_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dazzid_pipeline` is a English model originally trained by Dazzid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en_5.4.0_3.0_1718099520041.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dazzid_pipeline_en_5.4.0_3.0_1718099520041.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dazzid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dazzid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dazzid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Dazzid/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_en.md new file mode 100644 index 00000000000000..253e2f1715fd6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_deblagoj XlmRoBertaForTokenClassification from deblagoj +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_deblagoj +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_deblagoj` is a English model originally trained by deblagoj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_en_5.4.0_3.0_1718115016832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_en_5.4.0_3.0_1718115016832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_deblagoj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_deblagoj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_deblagoj| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.1 MB| + +## References + +https://huggingface.co/deblagoj/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en.md new file mode 100644 index 00000000000000..3fca7ca1dedde8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline pipeline XlmRoBertaForTokenClassification from deblagoj +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline` is a English model originally trained by deblagoj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en_5.4.0_3.0_1718115097742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline_en_5.4.0_3.0_1718115097742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_deblagoj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.1 MB| + +## References + +https://huggingface.co/deblagoj/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_en.md new file mode 100644 index 00000000000000..851498f688b55b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dochee XlmRoBertaForTokenClassification from Dochee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dochee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dochee` is a English model originally trained by Dochee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_en_5.4.0_3.0_1718119549955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_en_5.4.0_3.0_1718119549955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dochee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_dochee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dochee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/Dochee/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en.md new file mode 100644 index 00000000000000..ff3b310985e66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_dochee_pipeline pipeline XlmRoBertaForTokenClassification from Dochee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_dochee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_dochee_pipeline` is a English model originally trained by Dochee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en_5.4.0_3.0_1718119646138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_dochee_pipeline_en_5.4.0_3.0_1718119646138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dochee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_dochee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_dochee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.4 MB| + +## References + +https://huggingface.co/Dochee/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_en.md new file mode 100644 index 00000000000000..2fd1d52f2459b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_donaldyy XlmRoBertaForTokenClassification from donaldyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_donaldyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_donaldyy` is a English model originally trained by donaldyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_en_5.4.0_3.0_1718130118903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_en_5.4.0_3.0_1718130118903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_donaldyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_donaldyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_donaldyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/donaldyy/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en.md new file mode 100644 index 00000000000000..2178aa9b723ea8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline pipeline XlmRoBertaForTokenClassification from donaldyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline` is a English model originally trained by donaldyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en_5.4.0_3.0_1718130226492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline_en_5.4.0_3.0_1718130226492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_donaldyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/donaldyy/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_en.md new file mode 100644 index 00000000000000..3dbe33352c93ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ducdh1210 XlmRoBertaForTokenClassification from ducdh1210 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ducdh1210 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ducdh1210` is a English model originally trained by ducdh1210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_en_5.4.0_3.0_1718125420831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_en_5.4.0_3.0_1718125420831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ducdh1210","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ducdh1210", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ducdh1210| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ducdh1210/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en.md new file mode 100644 index 00000000000000..054e366eba54a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline pipeline XlmRoBertaForTokenClassification from ducdh1210 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline` is a English model originally trained by ducdh1210. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en_5.4.0_3.0_1718125522574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline_en_5.4.0_3.0_1718125522574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ducdh1210_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ducdh1210/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_en.md new file mode 100644 index 00000000000000..1d976d74f1bad9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_100yen XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_100yen +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_100yen` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_en_5.4.0_3.0_1718131268289.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_en_5.4.0_3.0_1718131268289.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_100yen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_100yen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_100yen| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en.md new file mode 100644 index 00000000000000..835291aa9256e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline pipeline XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en_5.4.0_3.0_1718131387496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline_en_5.4.0_3.0_1718131387496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_100yen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en.md new file mode 100644 index 00000000000000..348c7359bfcaee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail XlmRoBertaForTokenClassification from ahmad-alismail +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail` is a English model originally trained by ahmad-alismail. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en_5.4.0_3.0_1718127118950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_en_5.4.0_3.0_1718127118950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ahmad-alismail/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en.md new file mode 100644 index 00000000000000..a13b71430a853e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline pipeline XlmRoBertaForTokenClassification from ahmad-alismail +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline` is a English model originally trained by ahmad-alismail. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en_5.4.0_3.0_1718127202473.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline_en_5.4.0_3.0_1718127202473.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ahmad_alismail_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ahmad-alismail/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_en.md new file mode 100644 index 00000000000000..ae63727fbe701c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_andrew45 XlmRoBertaForTokenClassification from andrew45 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_andrew45 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_andrew45` is a English model originally trained by andrew45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_en_5.4.0_3.0_1718120895493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_en_5.4.0_3.0_1718120895493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_andrew45","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_andrew45", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_andrew45| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/andrew45/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en.md new file mode 100644 index 00000000000000..9dc53ab0fefa4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline pipeline XlmRoBertaForTokenClassification from andrew45 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline` is a English model originally trained by andrew45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en_5.4.0_3.0_1718121015964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline_en_5.4.0_3.0_1718121015964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_andrew45_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/andrew45/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_en.md new file mode 100644 index 00000000000000..96ad403833d9d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cogitur XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cogitur +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cogitur` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_en_5.4.0_3.0_1718111374482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_en_5.4.0_3.0_1718111374482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cogitur","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cogitur", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cogitur| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en.md new file mode 100644 index 00000000000000..665d771bf999b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline pipeline XlmRoBertaForTokenClassification from cogitur +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline` is a English model originally trained by cogitur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en_5.4.0_3.0_1718111458848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline_en_5.4.0_3.0_1718111458848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cogitur_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cogitur/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_en.md new file mode 100644 index 00000000000000..80dbf969ebba13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cyycyy XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cyycyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cyycyy` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_en_5.4.0_3.0_1718116757095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_en_5.4.0_3.0_1718116757095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cyycyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_cyycyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cyycyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en.md new file mode 100644 index 00000000000000..d03477176aad2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline pipeline XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en_5.4.0_3.0_1718116841921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline_en_5.4.0_3.0_1718116841921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_cyycyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_en.md new file mode 100644 index 00000000000000..72d0a570f401c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_haesun XlmRoBertaForTokenClassification from haesun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_haesun +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_haesun` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_en_5.4.0_3.0_1718105842438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_en_5.4.0_3.0_1718105842438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_haesun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_haesun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_haesun| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/haesun/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en.md new file mode 100644 index 00000000000000..c563f494d5b93e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline pipeline XlmRoBertaForTokenClassification from haesun +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en_5.4.0_3.0_1718105959103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline_en_5.4.0_3.0_1718105959103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_haesun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/haesun/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_en.md new file mode 100644 index 00000000000000..478a00e5ab35e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hanlforever XlmRoBertaForTokenClassification from hanlforever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hanlforever +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hanlforever` is a English model originally trained by hanlforever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_en_5.4.0_3.0_1718135057337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_en_5.4.0_3.0_1718135057337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hanlforever","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hanlforever", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hanlforever| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/hanlforever/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en.md new file mode 100644 index 00000000000000..c83e6a09059d16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline pipeline XlmRoBertaForTokenClassification from hanlforever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline` is a English model originally trained by hanlforever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en_5.4.0_3.0_1718135163886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline_en_5.4.0_3.0_1718135163886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hanlforever_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/hanlforever/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_en.md new file mode 100644 index 00000000000000..9cc90e7f2ebe5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_heerak XlmRoBertaForTokenClassification from Heerak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_heerak +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_heerak` is a English model originally trained by Heerak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_en_5.4.0_3.0_1718124551631.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_en_5.4.0_3.0_1718124551631.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_heerak","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_heerak", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_heerak| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Heerak/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en.md new file mode 100644 index 00000000000000..15fc40869b3de2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline pipeline XlmRoBertaForTokenClassification from Heerak +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline` is a English model originally trained by Heerak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en_5.4.0_3.0_1718124636193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline_en_5.4.0_3.0_1718124636193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_heerak_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Heerak/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_en.md new file mode 100644 index 00000000000000..048de732711d80 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_henryjiang XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_henryjiang +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_henryjiang` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_en_5.4.0_3.0_1718138157866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_en_5.4.0_3.0_1718138157866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_henryjiang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_henryjiang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_henryjiang| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en.md new file mode 100644 index 00000000000000..0ff9066dcf875f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline pipeline XlmRoBertaForTokenClassification from henryjiang +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline` is a English model originally trained by henryjiang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en_5.4.0_3.0_1718138239265.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline_en_5.4.0_3.0_1718138239265.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_henryjiang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/henryjiang/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_en.md new file mode 100644 index 00000000000000..ed668a7a2fbe26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_en_5.4.0_3.0_1718102635867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_en_5.4.0_3.0_1718102635867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..3812d8f044f7e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en_5.4.0_3.0_1718102716868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline_en_5.4.0_3.0_1718102716868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.5 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_en.md new file mode 100644 index 00000000000000..d1ebceb3cc1d06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_huggingbase XlmRoBertaForTokenClassification from huggingbase +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_huggingbase +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_huggingbase` is a English model originally trained by huggingbase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_en_5.4.0_3.0_1718101546763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_en_5.4.0_3.0_1718101546763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_huggingbase","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_huggingbase", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_huggingbase| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/huggingbase/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en.md new file mode 100644 index 00000000000000..93a5813a855cc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline pipeline XlmRoBertaForTokenClassification from huggingbase +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline` is a English model originally trained by huggingbase. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en_5.4.0_3.0_1718101630829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline_en_5.4.0_3.0_1718101630829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_huggingbase_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/huggingbase/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_en.md new file mode 100644 index 00000000000000..748b8b4bc4ea17 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jamie613 XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jamie613 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jamie613` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_en_5.4.0_3.0_1718131268033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_en_5.4.0_3.0_1718131268033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jamie613","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jamie613", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jamie613| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en.md new file mode 100644 index 00000000000000..3833ee17d71a12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline pipeline XlmRoBertaForTokenClassification from jamie613 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline` is a English model originally trained by jamie613. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en_5.4.0_3.0_1718131360659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline_en_5.4.0_3.0_1718131360659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jamie613_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jamie613/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_en.md new file mode 100644 index 00000000000000..0dabb6e853e4ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jojeyh XlmRoBertaForTokenClassification from jojeyh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jojeyh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jojeyh` is a English model originally trained by jojeyh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_en_5.4.0_3.0_1718109238641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_en_5.4.0_3.0_1718109238641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jojeyh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_jojeyh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jojeyh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jojeyh/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en.md new file mode 100644 index 00000000000000..b07b12467d2b6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline pipeline XlmRoBertaForTokenClassification from jojeyh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline` is a English model originally trained by jojeyh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en_5.4.0_3.0_1718109322711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline_en_5.4.0_3.0_1718109322711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_jojeyh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/jojeyh/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_en.md new file mode 100644 index 00000000000000..2d834a46850450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_k3lana XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_k3lana +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_k3lana` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_en_5.4.0_3.0_1718121913813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_en_5.4.0_3.0_1718121913813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_k3lana","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_k3lana", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_k3lana| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en.md new file mode 100644 index 00000000000000..697dd445e0163e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline pipeline XlmRoBertaForTokenClassification from k3lana +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline` is a English model originally trained by k3lana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en_5.4.0_3.0_1718122001080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline_en_5.4.0_3.0_1718122001080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_k3lana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/k3lana/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_en.md new file mode 100644 index 00000000000000..f587020af0b283 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_likejazz XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_likejazz +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_likejazz` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_en_5.4.0_3.0_1718121933346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_en_5.4.0_3.0_1718121933346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_likejazz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_likejazz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_likejazz| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en.md new file mode 100644 index 00000000000000..70cf58ae55b2d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline pipeline XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en_5.4.0_3.0_1718122058650.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline_en_5.4.0_3.0_1718122058650.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_likejazz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_en.md new file mode 100644 index 00000000000000..ea40b7a4d17122 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_nisimura XlmRoBertaForTokenClassification from nisimura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_nisimura +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_nisimura` is a English model originally trained by nisimura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_en_5.4.0_3.0_1718109869993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_en_5.4.0_3.0_1718109869993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_nisimura","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_nisimura", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_nisimura| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/nisimura/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en.md new file mode 100644 index 00000000000000..d188c6ec66b462 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline pipeline XlmRoBertaForTokenClassification from nisimura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline` is a English model originally trained by nisimura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en_5.4.0_3.0_1718109964767.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline_en_5.4.0_3.0_1718109964767.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_nisimura_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|857.0 MB| + +## References + +https://huggingface.co/nisimura/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_en.md new file mode 100644 index 00000000000000..e97633ef667745 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_en_5.4.0_3.0_1718110988569.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_en_5.4.0_3.0_1718110988569.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.7 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..b9af566d95a695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en_5.4.0_3.0_1718111079452.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline_en_5.4.0_3.0_1718111079452.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.7 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_en.md new file mode 100644 index 00000000000000..bc8410ea2aa7bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_skr1125 XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_skr1125 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_skr1125` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_en_5.4.0_3.0_1718099314528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_en_5.4.0_3.0_1718099314528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_skr1125","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_skr1125", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_skr1125| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en.md new file mode 100644 index 00000000000000..752271f0ea81ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline pipeline XlmRoBertaForTokenClassification from skr1125 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline` is a English model originally trained by skr1125. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en_5.4.0_3.0_1718099409341.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline_en_5.4.0_3.0_1718099409341.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_skr1125_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/skr1125/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en.md new file mode 100644 index 00000000000000..27424f723f227f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sunwooooong XlmRoBertaForTokenClassification from sunwooooong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sunwooooong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sunwooooong` is a English model originally trained by sunwooooong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en_5.4.0_3.0_1718123034112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_en_5.4.0_3.0_1718123034112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sunwooooong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_sunwooooong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sunwooooong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/sunwooooong/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en.md new file mode 100644 index 00000000000000..8662f3a913c98d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline pipeline XlmRoBertaForTokenClassification from sunwooooong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline` is a English model originally trained by sunwooooong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en_5.4.0_3.0_1718123118627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline_en_5.4.0_3.0_1718123118627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_sunwooooong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/sunwooooong/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_en.md new file mode 100644 index 00000000000000..d633ab9f9427fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_transll XlmRoBertaForTokenClassification from TransLL +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_transll +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_transll` is a English model originally trained by TransLL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_en_5.4.0_3.0_1718105872509.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_en_5.4.0_3.0_1718105872509.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_transll","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_transll", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_transll| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/TransLL/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en.md new file mode 100644 index 00000000000000..9ca56f9ba234dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_transll_pipeline pipeline XlmRoBertaForTokenClassification from TransLL +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_transll_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_transll_pipeline` is a English model originally trained by TransLL. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en_5.4.0_3.0_1718105956549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_transll_pipeline_en_5.4.0_3.0_1718105956549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_transll_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_transll_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_transll_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/TransLL/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_en.md new file mode 100644 index 00000000000000..47d5b3ba6c98fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_udon3 XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_udon3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_udon3` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_en_5.4.0_3.0_1718117636042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_en_5.4.0_3.0_1718117636042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_udon3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_udon3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_udon3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en.md new file mode 100644 index 00000000000000..368c277af3c827 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline pipeline XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en_5.4.0_3.0_1718117728636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline_en_5.4.0_3.0_1718117728636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_udon3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_en.md new file mode 100644 index 00000000000000..7c5c03b5ffd2c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_yuri XlmRoBertaForTokenClassification from Yuri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_yuri +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_yuri` is a English model originally trained by Yuri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_en_5.4.0_3.0_1718099777534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_en_5.4.0_3.0_1718099777534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_yuri","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_yuri", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_yuri| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Yuri/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en.md new file mode 100644 index 00000000000000..100d6cc9b8df2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline pipeline XlmRoBertaForTokenClassification from Yuri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline` is a English model originally trained by Yuri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en_5.4.0_3.0_1718099861772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline_en_5.4.0_3.0_1718099861772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_yuri_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Yuri/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_en.md new file mode 100644 index 00000000000000..5620a01a0959d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_gcmsrc XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_gcmsrc +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_gcmsrc` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_en_5.4.0_3.0_1718100398700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_en_5.4.0_3.0_1718100398700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gcmsrc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_gcmsrc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_gcmsrc| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en.md new file mode 100644 index 00000000000000..929fdf2491e24d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline pipeline XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en_5.4.0_3.0_1718100488048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline_en_5.4.0_3.0_1718100488048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_gcmsrc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_en.md new file mode 100644 index 00000000000000..79586cc972568d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_guruji108 XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_guruji108 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_guruji108` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_en_5.4.0_3.0_1718104703463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_en_5.4.0_3.0_1718104703463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_guruji108","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_guruji108", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_guruji108| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en.md new file mode 100644 index 00000000000000..b02e0578519944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_guruji108_pipeline pipeline XlmRoBertaForTokenClassification from Guruji108 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_guruji108_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_guruji108_pipeline` is a English model originally trained by Guruji108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en_5.4.0_3.0_1718104798218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_guruji108_pipeline_en_5.4.0_3.0_1718104798218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_guruji108_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_guruji108_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_guruji108_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Guruji108/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_en.md new file mode 100644 index 00000000000000..995d926bfb334e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hash1360 XlmRoBertaForTokenClassification from Hash1360 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hash1360 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hash1360` is a English model originally trained by Hash1360. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_en_5.4.0_3.0_1718132057896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_en_5.4.0_3.0_1718132057896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hash1360","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hash1360", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hash1360| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Hash1360/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en.md new file mode 100644 index 00000000000000..6023f1e2434616 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hash1360_pipeline pipeline XlmRoBertaForTokenClassification from Hash1360 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hash1360_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hash1360_pipeline` is a English model originally trained by Hash1360. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en_5.4.0_3.0_1718132165879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hash1360_pipeline_en_5.4.0_3.0_1718132165879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hash1360_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hash1360_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hash1360_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Hash1360/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_en.md new file mode 100644 index 00000000000000..53b3f8d2867dff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hbtemari XlmRoBertaForTokenClassification from HBtemari +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hbtemari +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hbtemari` is a English model originally trained by HBtemari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_en_5.4.0_3.0_1718103003850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_en_5.4.0_3.0_1718103003850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hbtemari","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hbtemari", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hbtemari| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/HBtemari/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en.md new file mode 100644 index 00000000000000..99afedf4971389 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline pipeline XlmRoBertaForTokenClassification from HBtemari +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline` is a English model originally trained by HBtemari. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en_5.4.0_3.0_1718103090701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline_en_5.4.0_3.0_1718103090701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hbtemari_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/HBtemari/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_en.md new file mode 100644 index 00000000000000..3b4d12392032d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_en_5.4.0_3.0_1718106982560.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_en_5.4.0_3.0_1718106982560.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.3 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..76613a6796d47a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en_5.4.0_3.0_1718107065701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline_en_5.4.0_3.0_1718107065701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.3 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_en.md new file mode 100644 index 00000000000000..241969dd664da1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitakura XlmRoBertaForTokenClassification from hitakura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitakura +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitakura` is a English model originally trained by hitakura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_en_5.4.0_3.0_1718117876796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_en_5.4.0_3.0_1718117876796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitakura","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitakura", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitakura| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|471.4 MB| + +## References + +https://huggingface.co/hitakura/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en.md new file mode 100644 index 00000000000000..3ef1f75a47f906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitakura_pipeline pipeline XlmRoBertaForTokenClassification from hitakura +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitakura_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitakura_pipeline` is a English model originally trained by hitakura. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en_5.4.0_3.0_1718117963205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitakura_pipeline_en_5.4.0_3.0_1718117963205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitakura_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitakura_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitakura_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|471.5 MB| + +## References + +https://huggingface.co/hitakura/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en.md new file mode 100644 index 00000000000000..178987011453a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitoshinagaoka XlmRoBertaForTokenClassification from hitoshiNagaoka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitoshinagaoka +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitoshinagaoka` is a English model originally trained by hitoshiNagaoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en_5.4.0_3.0_1718127763830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_en_5.4.0_3.0_1718127763830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitoshinagaoka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hitoshiNagaoka/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en.md new file mode 100644 index 00000000000000..3348132b753762 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline pipeline XlmRoBertaForTokenClassification from hitoshiNagaoka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline` is a English model originally trained by hitoshiNagaoka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en_5.4.0_3.0_1718127850435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline_en_5.4.0_3.0_1718127850435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hitoshinagaoka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hitoshiNagaoka/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_en.md new file mode 100644 index 00000000000000..744b3b1449e130 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hyeonseo XlmRoBertaForTokenClassification from Hyeonseo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hyeonseo +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hyeonseo` is a English model originally trained by Hyeonseo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_en_5.4.0_3.0_1718108936407.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_en_5.4.0_3.0_1718108936407.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hyeonseo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hyeonseo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hyeonseo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|808.3 MB| + +## References + +https://huggingface.co/Hyeonseo/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en.md new file mode 100644 index 00000000000000..1de0eed145d634 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline pipeline XlmRoBertaForTokenClassification from Hyeonseo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline` is a English model originally trained by Hyeonseo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en_5.4.0_3.0_1718109056653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline_en_5.4.0_3.0_1718109056653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hyeonseo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|808.3 MB| + +## References + +https://huggingface.co/Hyeonseo/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_en.md new file mode 100644 index 00000000000000..b7a76a52007481 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_inniok XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_inniok +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_inniok` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_en_5.4.0_3.0_1718139190318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_en_5.4.0_3.0_1718139190318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_inniok","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_inniok", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_inniok| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en.md new file mode 100644 index 00000000000000..6aae24171a38e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_inniok_pipeline pipeline XlmRoBertaForTokenClassification from inniok +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_inniok_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_inniok_pipeline` is a English model originally trained by inniok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en_5.4.0_3.0_1718139297331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_inniok_pipeline_en_5.4.0_3.0_1718139297331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_inniok_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_inniok_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_inniok_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/inniok/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_en.md new file mode 100644 index 00000000000000..682ad2899330dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jakobbrunner XlmRoBertaForTokenClassification from jakobBrunner +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jakobbrunner +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jakobbrunner` is a English model originally trained by jakobBrunner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_en_5.4.0_3.0_1718135163047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_en_5.4.0_3.0_1718135163047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jakobbrunner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jakobbrunner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jakobbrunner| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jakobBrunner/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en.md new file mode 100644 index 00000000000000..360982b79f14ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline pipeline XlmRoBertaForTokenClassification from jakobBrunner +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline` is a English model originally trained by jakobBrunner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en_5.4.0_3.0_1718135258571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline_en_5.4.0_3.0_1718135258571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jakobbrunner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jakobBrunner/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_en.md new file mode 100644 index 00000000000000..da4a1831fe6316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jjglilleberg XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jjglilleberg +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jjglilleberg` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_en_5.4.0_3.0_1718120147023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_en_5.4.0_3.0_1718120147023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jjglilleberg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jjglilleberg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jjglilleberg| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en.md new file mode 100644 index 00000000000000..6bfd40ee56701e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline pipeline XlmRoBertaForTokenClassification from jjglilleberg +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline` is a English model originally trained by jjglilleberg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en_5.4.0_3.0_1718120254741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline_en_5.4.0_3.0_1718120254741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jjglilleberg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/jjglilleberg/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_en.md new file mode 100644 index 00000000000000..94dfcae1ba40cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_junghim XlmRoBertaForTokenClassification from Junghim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_junghim +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_junghim` is a English model originally trained by Junghim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_en_5.4.0_3.0_1718115675144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_en_5.4.0_3.0_1718115675144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_junghim","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_junghim", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_junghim| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Junghim/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en.md new file mode 100644 index 00000000000000..2371599f22e8bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_junghim_pipeline pipeline XlmRoBertaForTokenClassification from Junghim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_junghim_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_junghim_pipeline` is a English model originally trained by Junghim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en_5.4.0_3.0_1718115772572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_junghim_pipeline_en_5.4.0_3.0_1718115772572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_junghim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_junghim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_junghim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Junghim/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_en.md new file mode 100644 index 00000000000000..96283eecbedc30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jzwk XlmRoBertaForTokenClassification from Jzwk +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jzwk +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jzwk` is a English model originally trained by Jzwk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_en_5.4.0_3.0_1718133041002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_en_5.4.0_3.0_1718133041002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jzwk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_jzwk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jzwk| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|851.7 MB| + +## References + +https://huggingface.co/Jzwk/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en.md new file mode 100644 index 00000000000000..1040619c6d1752 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_jzwk_pipeline pipeline XlmRoBertaForTokenClassification from Jzwk +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_jzwk_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_jzwk_pipeline` is a English model originally trained by Jzwk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en_5.4.0_3.0_1718133134180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_jzwk_pipeline_en_5.4.0_3.0_1718133134180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jzwk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_jzwk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_jzwk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/Jzwk/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_en.md new file mode 100644 index 00000000000000..7369b3030b5a03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_k_masaki XlmRoBertaForTokenClassification from k-masaki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_k_masaki +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_k_masaki` is a English model originally trained by k-masaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_en_5.4.0_3.0_1718101962125.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_en_5.4.0_3.0_1718101962125.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_k_masaki","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_k_masaki", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_k_masaki| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/k-masaki/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en.md new file mode 100644 index 00000000000000..4f362bcebaf1cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline pipeline XlmRoBertaForTokenClassification from k-masaki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline` is a English model originally trained by k-masaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en_5.4.0_3.0_1718102048238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline_en_5.4.0_3.0_1718102048238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_k_masaki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/k-masaki/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_en.md new file mode 100644 index 00000000000000..b217176047ec0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_karolk XlmRoBertaForTokenClassification from KarolK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_karolk +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_karolk` is a English model originally trained by KarolK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_en_5.4.0_3.0_1718123398719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_en_5.4.0_3.0_1718123398719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_karolk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_karolk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_karolk| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|851.7 MB| + +## References + +https://huggingface.co/KarolK/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en.md new file mode 100644 index 00000000000000..6cfa60b222ebf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_karolk_pipeline pipeline XlmRoBertaForTokenClassification from KarolK +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_karolk_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_karolk_pipeline` is a English model originally trained by KarolK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en_5.4.0_3.0_1718123492213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_karolk_pipeline_en_5.4.0_3.0_1718123492213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_karolk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_karolk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_karolk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/KarolK/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_en.md new file mode 100644 index 00000000000000..9b6ce3b0499b9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_kenhoffman XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_kenhoffman +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_kenhoffman` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_en_5.4.0_3.0_1718132455810.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_en_5.4.0_3.0_1718132455810.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_kenhoffman","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_kenhoffman", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_kenhoffman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en.md new file mode 100644 index 00000000000000..c26f4ea797c31c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline pipeline XlmRoBertaForTokenClassification from kenhoffman +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline` is a English model originally trained by kenhoffman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en_5.4.0_3.0_1718132542460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline_en_5.4.0_3.0_1718132542460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_kenhoffman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/kenhoffman/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_en.md new file mode 100644 index 00000000000000..8801ae0c40dd22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_khadija267 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_en_5.4.0_3.0_1718130099628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_en_5.4.0_3.0_1718130099628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_khadija267| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en.md new file mode 100644 index 00000000000000..2ace6bf733f98d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_khadija267_pipeline pipeline XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_khadija267_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_khadija267_pipeline` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en_5.4.0_3.0_1718130185906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_khadija267_pipeline_en_5.4.0_3.0_1718130185906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_khadija267_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_khadija267_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_khadija267_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_en.md new file mode 100644 index 00000000000000..753b2f7f801794 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_leotunganh XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_leotunganh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_en_5.4.0_3.0_1718122360010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_en_5.4.0_3.0_1718122360010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_leotunganh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|835.3 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en.md new file mode 100644 index 00000000000000..9727855b2219f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline pipeline XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en_5.4.0_3.0_1718122474380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline_en_5.4.0_3.0_1718122474380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_leotunganh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.3 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_en.md new file mode 100644 index 00000000000000..193790a9e611fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_likejazz XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_likejazz +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_likejazz` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_en_5.4.0_3.0_1718119556549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_en_5.4.0_3.0_1718119556549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_likejazz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_likejazz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_likejazz| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|847.3 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en.md new file mode 100644 index 00000000000000..84d6f627873cf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_likejazz_pipeline pipeline XlmRoBertaForTokenClassification from likejazz +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_likejazz_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_likejazz_pipeline` is a English model originally trained by likejazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en_5.4.0_3.0_1718119692892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_likejazz_pipeline_en_5.4.0_3.0_1718119692892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_likejazz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_likejazz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_likejazz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|847.3 MB| + +## References + +https://huggingface.co/likejazz/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_en.md new file mode 100644 index 00000000000000..8f7d74a5431320 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_matovu_ronald XlmRoBertaForTokenClassification from matovu-ronald +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_matovu_ronald +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_matovu_ronald` is a English model originally trained by matovu-ronald. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_en_5.4.0_3.0_1718132072689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_en_5.4.0_3.0_1718132072689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_matovu_ronald","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_matovu_ronald", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_matovu_ronald| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/matovu-ronald/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en.md new file mode 100644 index 00000000000000..29ef379fe3ec59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline pipeline XlmRoBertaForTokenClassification from matovu-ronald +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline` is a English model originally trained by matovu-ronald. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en_5.4.0_3.0_1718132181729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline_en_5.4.0_3.0_1718132181729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_matovu_ronald_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/matovu-ronald/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_en.md new file mode 100644 index 00000000000000..b2a2ec2a63bfbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_maxnet XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_maxnet +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_maxnet` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_en_5.4.0_3.0_1718126525706.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_en_5.4.0_3.0_1718126525706.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_maxnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_maxnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_maxnet| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en.md new file mode 100644 index 00000000000000..e182e3ff4a1649 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_maxnet_pipeline pipeline XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_maxnet_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_maxnet_pipeline` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en_5.4.0_3.0_1718126622720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_maxnet_pipeline_en_5.4.0_3.0_1718126622720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_maxnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_maxnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_maxnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_en.md new file mode 100644 index 00000000000000..c6dccd1ab60648 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mj03 XlmRoBertaForTokenClassification from MJ03 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mj03 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mj03` is a English model originally trained by MJ03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_en_5.4.0_3.0_1718126523601.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_en_5.4.0_3.0_1718126523601.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mj03","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mj03", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mj03| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/MJ03/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en.md new file mode 100644 index 00000000000000..fc53c6cda4c1ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mj03_pipeline pipeline XlmRoBertaForTokenClassification from MJ03 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mj03_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mj03_pipeline` is a English model originally trained by MJ03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en_5.4.0_3.0_1718126620244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mj03_pipeline_en_5.4.0_3.0_1718126620244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mj03_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mj03_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mj03_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/MJ03/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_en.md new file mode 100644 index 00000000000000..2877b109bc27bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mlewand XlmRoBertaForTokenClassification from mlewand +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mlewand +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mlewand` is a English model originally trained by mlewand. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_en_5.4.0_3.0_1718112538480.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_en_5.4.0_3.0_1718112538480.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mlewand","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mlewand", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mlewand| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mlewand/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en.md new file mode 100644 index 00000000000000..6df8610ab38ea4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mlewand_pipeline pipeline XlmRoBertaForTokenClassification from mlewand +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mlewand_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mlewand_pipeline` is a English model originally trained by mlewand. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en_5.4.0_3.0_1718112625634.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mlewand_pipeline_en_5.4.0_3.0_1718112625634.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mlewand_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mlewand_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mlewand_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mlewand/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_en.md new file mode 100644 index 00000000000000..c9e0721549dd39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mmiketan XlmRoBertaForTokenClassification from mmiketan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mmiketan +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mmiketan` is a English model originally trained by mmiketan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_en_5.4.0_3.0_1718101894809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_en_5.4.0_3.0_1718101894809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mmiketan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mmiketan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mmiketan| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mmiketan/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en.md new file mode 100644 index 00000000000000..27910aa260e048 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline pipeline XlmRoBertaForTokenClassification from mmiketan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline` is a English model originally trained by mmiketan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en_5.4.0_3.0_1718101981999.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline_en_5.4.0_3.0_1718101981999.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mmiketan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mmiketan/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_en.md new file mode 100644 index 00000000000000..73a08696e5e6db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mooface XlmRoBertaForTokenClassification from mooface +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mooface +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mooface` is a English model originally trained by mooface. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_en_5.4.0_3.0_1718119981574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_en_5.4.0_3.0_1718119981574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mooface","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_mooface", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mooface| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mooface/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en.md new file mode 100644 index 00000000000000..3bcfba7dc76d9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_mooface_pipeline pipeline XlmRoBertaForTokenClassification from mooface +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_mooface_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_mooface_pipeline` is a English model originally trained by mooface. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en_5.4.0_3.0_1718120069280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_mooface_pipeline_en_5.4.0_3.0_1718120069280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mooface_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_mooface_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_mooface_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/mooface/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_en.md new file mode 100644 index 00000000000000..50651616708b98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_msrisrujan XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_msrisrujan +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_msrisrujan` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_en_5.4.0_3.0_1718100404032.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_en_5.4.0_3.0_1718100404032.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_msrisrujan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_msrisrujan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_msrisrujan| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en.md new file mode 100644 index 00000000000000..8c9401f0fd48e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline pipeline XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en_5.4.0_3.0_1718100501761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline_en_5.4.0_3.0_1718100501761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_msrisrujan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_en.md new file mode 100644 index 00000000000000..e564b65f356b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_myasa XlmRoBertaForTokenClassification from myasa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_myasa +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_myasa` is a English model originally trained by myasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_en_5.4.0_3.0_1718118469660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_en_5.4.0_3.0_1718118469660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_myasa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_myasa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_myasa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/myasa/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en.md new file mode 100644 index 00000000000000..ff384daefa995d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_myasa_pipeline pipeline XlmRoBertaForTokenClassification from myasa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_myasa_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_myasa_pipeline` is a English model originally trained by myasa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en_5.4.0_3.0_1718118556747.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_myasa_pipeline_en_5.4.0_3.0_1718118556747.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_myasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_myasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_myasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/myasa/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_en.md new file mode 100644 index 00000000000000..c5bbb0096da25b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_neha2608 XlmRoBertaForTokenClassification from Neha2608 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_neha2608 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_neha2608` is a English model originally trained by Neha2608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_en_5.4.0_3.0_1718105115537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_en_5.4.0_3.0_1718105115537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_neha2608","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_neha2608", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_neha2608| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en.md new file mode 100644 index 00000000000000..47c3bf7abb6b81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_neha2608_pipeline pipeline XlmRoBertaForTokenClassification from Neha2608 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_neha2608_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_neha2608_pipeline` is a English model originally trained by Neha2608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en_5.4.0_3.0_1718105206370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_neha2608_pipeline_en_5.4.0_3.0_1718105206370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_neha2608_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_neha2608_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_neha2608_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_en.md new file mode 100644 index 00000000000000..631f3a771b673c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nokomoro3 XlmRoBertaForTokenClassification from nokomoro3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nokomoro3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nokomoro3` is a English model originally trained by nokomoro3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_en_5.4.0_3.0_1718106230902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_en_5.4.0_3.0_1718106230902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nokomoro3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_nokomoro3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nokomoro3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/nokomoro3/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en.md new file mode 100644 index 00000000000000..cf78a75a877486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline pipeline XlmRoBertaForTokenClassification from nokomoro3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline` is a English model originally trained by nokomoro3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en_5.4.0_3.0_1718106317147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline_en_5.4.0_3.0_1718106317147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_nokomoro3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/nokomoro3/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_en.md new file mode 100644 index 00000000000000..03b426721007e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ntaka XlmRoBertaForTokenClassification from ntaka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ntaka +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ntaka` is a English model originally trained by ntaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_en_5.4.0_3.0_1718105820330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_en_5.4.0_3.0_1718105820330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ntaka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_ntaka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ntaka| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ntaka/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en.md new file mode 100644 index 00000000000000..00d8d5f899711d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_ntaka_pipeline pipeline XlmRoBertaForTokenClassification from ntaka +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_ntaka_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_ntaka_pipeline` is a English model originally trained by ntaka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en_5.4.0_3.0_1718105908847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_ntaka_pipeline_en_5.4.0_3.0_1718105908847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ntaka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_ntaka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_ntaka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ntaka/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_en.md new file mode 100644 index 00000000000000..dcb2f36ab63c07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_omersubasi XlmRoBertaForTokenClassification from omersubasi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_omersubasi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_omersubasi` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_en_5.4.0_3.0_1718115741102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_en_5.4.0_3.0_1718115741102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_omersubasi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_omersubasi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_omersubasi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/omersubasi/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en.md new file mode 100644 index 00000000000000..52912ee9d525a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline pipeline XlmRoBertaForTokenClassification from omersubasi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline` is a English model originally trained by omersubasi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en_5.4.0_3.0_1718115828571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline_en_5.4.0_3.0_1718115828571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_omersubasi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/omersubasi/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_en.md new file mode 100644 index 00000000000000..e10c8208b17859 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_otrturn XlmRoBertaForTokenClassification from otrturn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_otrturn +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_otrturn` is a English model originally trained by otrturn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_en_5.4.0_3.0_1718112141420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_en_5.4.0_3.0_1718112141420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_otrturn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_otrturn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_otrturn| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/otrturn/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en.md new file mode 100644 index 00000000000000..289dc30b13b20a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_otrturn_pipeline pipeline XlmRoBertaForTokenClassification from otrturn +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_otrturn_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_otrturn_pipeline` is a English model originally trained by otrturn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en_5.4.0_3.0_1718112254043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_otrturn_pipeline_en_5.4.0_3.0_1718112254043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_otrturn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_otrturn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_otrturn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/otrturn/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_en.md new file mode 100644 index 00000000000000..cd6199669f8dbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_en_5.4.0_3.0_1718108315180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_en_5.4.0_3.0_1718108315180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..4616dc9bd5c034 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en_5.4.0_3.0_1718108394682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_philosucker_pipeline_en_5.4.0_3.0_1718108394682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.4 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_en.md new file mode 100644 index 00000000000000..65d88b6b560deb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_qilin1 XlmRoBertaForTokenClassification from qilin1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_qilin1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_qilin1` is a English model originally trained by qilin1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_en_5.4.0_3.0_1718123208307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_en_5.4.0_3.0_1718123208307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_qilin1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_qilin1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_qilin1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/qilin1/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en.md new file mode 100644 index 00000000000000..3fa86b4b2cba6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_qilin1_pipeline pipeline XlmRoBertaForTokenClassification from qilin1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_qilin1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_qilin1_pipeline` is a English model originally trained by qilin1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en_5.4.0_3.0_1718123289809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_qilin1_pipeline_en_5.4.0_3.0_1718123289809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_qilin1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_qilin1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_qilin1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|854.5 MB| + +## References + +https://huggingface.co/qilin1/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_en.md new file mode 100644 index 00000000000000..39f058ab21f6fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_reinoudbosch XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_reinoudbosch +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_reinoudbosch` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_en_5.4.0_3.0_1718098416619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_en_5.4.0_3.0_1718098416619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_reinoudbosch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_reinoudbosch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_reinoudbosch| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en.md new file mode 100644 index 00000000000000..bfc46e10710746 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline pipeline XlmRoBertaForTokenClassification from reinoudbosch +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline` is a English model originally trained by reinoudbosch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en_5.4.0_3.0_1718098502695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline_en_5.4.0_3.0_1718098502695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_reinoudbosch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/reinoudbosch/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_en.md new file mode 100644 index 00000000000000..8509437f1c75f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_robinschaefer XlmRoBertaForTokenClassification from RobinSchaefer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_robinschaefer +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_robinschaefer` is a English model originally trained by RobinSchaefer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_en_5.4.0_3.0_1718121948130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_en_5.4.0_3.0_1718121948130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_robinschaefer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_robinschaefer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_robinschaefer| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.7 MB| + +## References + +https://huggingface.co/RobinSchaefer/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en.md new file mode 100644 index 00000000000000..09b11423260682 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline pipeline XlmRoBertaForTokenClassification from RobinSchaefer +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline` is a English model originally trained by RobinSchaefer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en_5.4.0_3.0_1718122064053.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline_en_5.4.0_3.0_1718122064053.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_robinschaefer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.7 MB| + +## References + +https://huggingface.co/RobinSchaefer/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en.md new file mode 100644 index 00000000000000..c62f62052cbbda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungkwangjoong XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungkwangjoong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungkwangjoong` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en_5.4.0_3.0_1718124060651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_en_5.4.0_3.0_1718124060651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungkwangjoong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungkwangjoong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungkwangjoong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en.md new file mode 100644 index 00000000000000..258d48f58a1b70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline pipeline XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en_5.4.0_3.0_1718124181292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline_en_5.4.0_3.0_1718124181292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungkwangjoong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_en.md new file mode 100644 index 00000000000000..d262d2798bd3b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungwoo1 XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungwoo1 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungwoo1` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_en_5.4.0_3.0_1718116194052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_en_5.4.0_3.0_1718116194052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungwoo1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_sungwoo1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungwoo1| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en.md new file mode 100644 index 00000000000000..a901b8ae4ea32a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline pipeline XlmRoBertaForTokenClassification from sungwoo1 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline` is a English model originally trained by sungwoo1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en_5.4.0_3.0_1718116283936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline_en_5.4.0_3.0_1718116283936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_sungwoo1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/sungwoo1/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_en.md new file mode 100644 index 00000000000000..d313391d38aa1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_team_nave XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_team_nave +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_team_nave` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_en_5.4.0_3.0_1718110331261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_en_5.4.0_3.0_1718110331261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_team_nave","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_team_nave", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_team_nave| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|851.7 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en.md new file mode 100644 index 00000000000000..480deaeb34855f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_team_nave_pipeline pipeline XlmRoBertaForTokenClassification from team-nave +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_team_nave_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_team_nave_pipeline` is a English model originally trained by team-nave. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en_5.4.0_3.0_1718110427059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_team_nave_pipeline_en_5.4.0_3.0_1718110427059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_team_nave_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_team_nave_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_team_nave_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|851.8 MB| + +## References + +https://huggingface.co/team-nave/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_en.md new file mode 100644 index 00000000000000..ce9ef022daace0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_udon3 XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_udon3 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_udon3` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_en_5.4.0_3.0_1718128811940.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_en_5.4.0_3.0_1718128811940.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_udon3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_udon3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_udon3| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en.md new file mode 100644 index 00000000000000..71732b4f71d7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_udon3_pipeline pipeline XlmRoBertaForTokenClassification from udon3 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_udon3_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_udon3_pipeline` is a English model originally trained by udon3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en_5.4.0_3.0_1718128899502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_udon3_pipeline_en_5.4.0_3.0_1718128899502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_udon3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_udon3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_udon3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/udon3/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_en.md new file mode 100644 index 00000000000000..079979dd16964f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_v3rx2000 XlmRoBertaForTokenClassification from V3RX2000 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_v3rx2000 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_v3rx2000` is a English model originally trained by V3RX2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_en_5.4.0_3.0_1718098454232.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_en_5.4.0_3.0_1718098454232.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_v3rx2000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_v3rx2000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_v3rx2000| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/V3RX2000/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en.md new file mode 100644 index 00000000000000..891ffce3c32dee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline pipeline XlmRoBertaForTokenClassification from V3RX2000 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline` is a English model originally trained by V3RX2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en_5.4.0_3.0_1718098541332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline_en_5.4.0_3.0_1718098541332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_v3rx2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/V3RX2000/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_en.md new file mode 100644 index 00000000000000..5bbae6e48fd077 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_wilcomply XlmRoBertaForTokenClassification from wilcomply +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_wilcomply +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_wilcomply` is a English model originally trained by wilcomply. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_en_5.4.0_3.0_1718121226377.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_en_5.4.0_3.0_1718121226377.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_wilcomply","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_wilcomply", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_wilcomply| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/wilcomply/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en.md new file mode 100644 index 00000000000000..6959cc9937b11c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline pipeline XlmRoBertaForTokenClassification from wilcomply +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline` is a English model originally trained by wilcomply. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en_5.4.0_3.0_1718121312919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline_en_5.4.0_3.0_1718121312919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_wilcomply_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/wilcomply/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_en.md new file mode 100644 index 00000000000000..fddf179aeac293 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_xyfigo XlmRoBertaForTokenClassification from xyfigo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_xyfigo +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_xyfigo` is a English model originally trained by xyfigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_en_5.4.0_3.0_1718101523698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_en_5.4.0_3.0_1718101523698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_xyfigo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_xyfigo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_xyfigo| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/xyfigo/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en.md new file mode 100644 index 00000000000000..5fd65eb9964569 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline pipeline XlmRoBertaForTokenClassification from xyfigo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline` is a English model originally trained by xyfigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en_5.4.0_3.0_1718101610349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline_en_5.4.0_3.0_1718101610349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_xyfigo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/xyfigo/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_en.md new file mode 100644 index 00000000000000..bc45d2cff4f483 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yong_sik XlmRoBertaForTokenClassification from Yong-Sik +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yong_sik +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yong_sik` is a English model originally trained by Yong-Sik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_en_5.4.0_3.0_1718119569027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_en_5.4.0_3.0_1718119569027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yong_sik","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yong_sik", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yong_sik| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Yong-Sik/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en.md new file mode 100644 index 00000000000000..65da6e3df76ff2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline pipeline XlmRoBertaForTokenClassification from Yong-Sik +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline` is a English model originally trained by Yong-Sik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en_5.4.0_3.0_1718119675272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline_en_5.4.0_3.0_1718119675272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yong_sik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Yong-Sik/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_en.md new file mode 100644 index 00000000000000..5bf304746ebab3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yyabuki XlmRoBertaForTokenClassification from yyabuki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yyabuki +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yyabuki` is a English model originally trained by yyabuki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_en_5.4.0_3.0_1718121921557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_en_5.4.0_3.0_1718121921557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yyabuki","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_yyabuki", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yyabuki| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/yyabuki/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en.md new file mode 100644 index 00000000000000..c09f947bcf81c4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline pipeline XlmRoBertaForTokenClassification from yyabuki +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline` is a English model originally trained by yyabuki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en_5.4.0_3.0_1718122014776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline_en_5.4.0_3.0_1718122014776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_yyabuki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/yyabuki/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en.md new file mode 100644 index 00000000000000..9df0c4f0d0395b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_marathi_marh XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_marathi_marh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_marathi_marh` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en_5.4.0_3.0_1718103640468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_en_5.4.0_3.0_1718103640468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_marathi_marh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_marathi_marh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_marathi_marh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi-mr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en.md new file mode 100644 index 00000000000000..11dbdabdd59739 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline pipeline XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en_5.4.0_3.0_1718103729015.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline_en_5.4.0_3.0_1718103729015.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_marathi_marh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi-mr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_en.md new file mode 100644 index 00000000000000..0eb7a089823c1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_neelrr XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_neelrr +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_neelrr` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_en_5.4.0_3.0_1718102561247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_en_5.4.0_3.0_1718102561247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_neelrr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_hindi_neelrr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_neelrr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|834.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en.md new file mode 100644 index 00000000000000..178d4b19a26458 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline pipeline XlmRoBertaForTokenClassification from neelrr +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline` is a English model originally trained by neelrr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en_5.4.0_3.0_1718102667469.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline_en_5.4.0_3.0_1718102667469.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_neelrr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|834.6 MB| + +## References + +https://huggingface.co/neelrr/xlm-roberta-base-finetuned-panx-hi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_en.md new file mode 100644 index 00000000000000..38b0c7dbb91aa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_100yen XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_100yen +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_100yen` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_en_5.4.0_3.0_1718130199192.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_en_5.4.0_3.0_1718130199192.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_100yen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_100yen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_100yen| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|816.7 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en.md new file mode 100644 index 00000000000000..8a2c8c194eb414 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_100yen_pipeline pipeline XlmRoBertaForTokenClassification from 100yen +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_100yen_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_100yen_pipeline` is a English model originally trained by 100yen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en_5.4.0_3.0_1718130323017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_100yen_pipeline_en_5.4.0_3.0_1718130323017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_100yen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_100yen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_100yen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|816.8 MB| + +## References + +https://huggingface.co/100yen/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_en.md new file mode 100644 index 00000000000000..7f6062fc25fb48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_54data XlmRoBertaForTokenClassification from 54data +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_54data +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_54data` is a English model originally trained by 54data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_en_5.4.0_3.0_1718113320370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_en_5.4.0_3.0_1718113320370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_54data","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_54data", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_54data| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/54data/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en.md new file mode 100644 index 00000000000000..91edaf08f771b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_54data_pipeline pipeline XlmRoBertaForTokenClassification from 54data +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_54data_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_54data_pipeline` is a English model originally trained by 54data. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en_5.4.0_3.0_1718113447338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_54data_pipeline_en_5.4.0_3.0_1718113447338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_54data_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_54data_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_54data_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/54data/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en.md new file mode 100644 index 00000000000000..55147169de0811 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_amartyobanerjee XlmRoBertaForTokenClassification from amartyobanerjee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_amartyobanerjee +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_amartyobanerjee` is a English model originally trained by amartyobanerjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en_5.4.0_3.0_1718124457792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_en_5.4.0_3.0_1718124457792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_amartyobanerjee| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/amartyobanerjee/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en.md new file mode 100644 index 00000000000000..d0ddf70bcc8954 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline pipeline XlmRoBertaForTokenClassification from amartyobanerjee +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline` is a English model originally trained by amartyobanerjee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en_5.4.0_3.0_1718124566686.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline_en_5.4.0_3.0_1718124566686.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_amartyobanerjee_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/amartyobanerjee/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_en.md new file mode 100644 index 00000000000000..942c84cbb42121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bluetree99 XlmRoBertaForTokenClassification from bluetree99 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bluetree99 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bluetree99` is a English model originally trained by bluetree99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_en_5.4.0_3.0_1718110050944.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_en_5.4.0_3.0_1718110050944.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bluetree99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bluetree99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bluetree99| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bluetree99/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en.md new file mode 100644 index 00000000000000..14f0e7549cdf00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline pipeline XlmRoBertaForTokenClassification from bluetree99 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline` is a English model originally trained by bluetree99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en_5.4.0_3.0_1718110162677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline_en_5.4.0_3.0_1718110162677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bluetree99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bluetree99/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_en.md new file mode 100644 index 00000000000000..e964ca7ffdb1ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bobojjhh XlmRoBertaForTokenClassification from bobojjhh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bobojjhh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bobojjhh` is a English model originally trained by bobojjhh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_en_5.4.0_3.0_1718130483717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_en_5.4.0_3.0_1718130483717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bobojjhh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_bobojjhh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bobojjhh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bobojjhh/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en.md new file mode 100644 index 00000000000000..c9269063f20c95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline pipeline XlmRoBertaForTokenClassification from bobojjhh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline` is a English model originally trained by bobojjhh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en_5.4.0_3.0_1718130592374.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline_en_5.4.0_3.0_1718130592374.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_bobojjhh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/bobojjhh/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_en.md new file mode 100644 index 00000000000000..5ed9057d0b4bf5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cataluna84 XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cataluna84 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cataluna84` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_en_5.4.0_3.0_1718117840298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_en_5.4.0_3.0_1718117840298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cataluna84","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cataluna84", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cataluna84| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en.md new file mode 100644 index 00000000000000..078c1e6871d1ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline pipeline XlmRoBertaForTokenClassification from cataluna84 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline` is a English model originally trained by cataluna84. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en_5.4.0_3.0_1718117949402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline_en_5.4.0_3.0_1718117949402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cataluna84_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/cataluna84/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_en.md new file mode 100644 index 00000000000000..b4ee629c276ab7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cyycyy XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cyycyy +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cyycyy` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_en_5.4.0_3.0_1718105827382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_en_5.4.0_3.0_1718105827382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cyycyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_cyycyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cyycyy| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en.md new file mode 100644 index 00000000000000..d564b08922d26f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline pipeline XlmRoBertaForTokenClassification from cyycyy +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline` is a English model originally trained by cyycyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en_5.4.0_3.0_1718105939331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline_en_5.4.0_3.0_1718105939331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_cyycyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/cyycyy/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_en.md new file mode 100644 index 00000000000000..585fba4fb41397 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_derekbear XlmRoBertaForTokenClassification from derekbear +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_derekbear +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_derekbear` is a English model originally trained by derekbear. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_en_5.4.0_3.0_1718113929190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_en_5.4.0_3.0_1718113929190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_derekbear","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_derekbear", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_derekbear| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/derekbear/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en.md new file mode 100644 index 00000000000000..1a0da6fefadfaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline pipeline XlmRoBertaForTokenClassification from derekbear +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline` is a English model originally trained by derekbear. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en_5.4.0_3.0_1718114038416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline_en_5.4.0_3.0_1718114038416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_derekbear_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/derekbear/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_en.md new file mode 100644 index 00000000000000..f17d820d95e91b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_dkasti XlmRoBertaForTokenClassification from dkasti +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_dkasti +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_dkasti` is a English model originally trained by dkasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_en_5.4.0_3.0_1718107907652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_en_5.4.0_3.0_1718107907652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_dkasti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_dkasti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_dkasti| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en.md new file mode 100644 index 00000000000000..52d65ae74bcfd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline pipeline XlmRoBertaForTokenClassification from dkasti +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline` is a English model originally trained by dkasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en_5.4.0_3.0_1718108023369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline_en_5.4.0_3.0_1718108023369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_dkasti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_en.md new file mode 100644 index 00000000000000..132fa88d4c7d5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_drigb XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_drigb +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_drigb` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_en_5.4.0_3.0_1718113318010.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_en_5.4.0_3.0_1718113318010.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_drigb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_drigb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_drigb| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en.md new file mode 100644 index 00000000000000..5f73c86660250e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_drigb_pipeline pipeline XlmRoBertaForTokenClassification from drigb +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_drigb_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_drigb_pipeline` is a English model originally trained by drigb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en_5.4.0_3.0_1718113435296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_drigb_pipeline_en_5.4.0_3.0_1718113435296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_drigb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_drigb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_drigb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/drigb/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_en.md new file mode 100644 index 00000000000000..35922f7776b508 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gcmsrc XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gcmsrc +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gcmsrc` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_en_5.4.0_3.0_1718099807507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_en_5.4.0_3.0_1718099807507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gcmsrc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gcmsrc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gcmsrc| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en.md new file mode 100644 index 00000000000000..2c05022b27647f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline pipeline XlmRoBertaForTokenClassification from gcmsrc +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline` is a English model originally trained by gcmsrc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en_5.4.0_3.0_1718099919437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline_en_5.4.0_3.0_1718099919437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gcmsrc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/gcmsrc/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_en.md new file mode 100644 index 00000000000000..649ef3f8edcd94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gogd XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gogd +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gogd` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_en_5.4.0_3.0_1718120899037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_en_5.4.0_3.0_1718120899037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gogd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_gogd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gogd| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en.md new file mode 100644 index 00000000000000..733cfd0ec71d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_gogd_pipeline pipeline XlmRoBertaForTokenClassification from GoGD +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_gogd_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_gogd_pipeline` is a English model originally trained by GoGD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en_5.4.0_3.0_1718121022052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_gogd_pipeline_en_5.4.0_3.0_1718121022052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gogd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_gogd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_gogd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/GoGD/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en.md new file mode 100644 index 00000000000000..664068f7cb4d98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech XlmRoBertaForTokenClassification from h-radiolo-tech +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech` is a English model originally trained by h-radiolo-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en_5.4.0_3.0_1718128230625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_en_5.4.0_3.0_1718128230625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/h-radiolo-tech/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en.md new file mode 100644 index 00000000000000..41b56edddadec1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline pipeline XlmRoBertaForTokenClassification from h-radiolo-tech +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline` is a English model originally trained by h-radiolo-tech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en_5.4.0_3.0_1718128339981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline_en_5.4.0_3.0_1718128339981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_h_radiolo_tech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/h-radiolo-tech/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_en.md new file mode 100644 index 00000000000000..bfb9bafda0d0e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hhffxx XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hhffxx +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hhffxx` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_en_5.4.0_3.0_1718114548530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_en_5.4.0_3.0_1718114548530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hhffxx","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_hhffxx", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hhffxx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en.md new file mode 100644 index 00000000000000..ac2dbf4ca5381f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline pipeline XlmRoBertaForTokenClassification from hhffxx +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline` is a English model originally trained by hhffxx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en_5.4.0_3.0_1718114643972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline_en_5.4.0_3.0_1718114643972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_hhffxx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|839.8 MB| + +## References + +https://huggingface.co/hhffxx/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_en.md new file mode 100644 index 00000000000000..048d30fb8ccb9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_isaacp XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_isaacp +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_isaacp` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_en_5.4.0_3.0_1718103653693.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_en_5.4.0_3.0_1718103653693.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_isaacp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_isaacp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_isaacp| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en.md new file mode 100644 index 00000000000000..36a8f9e1cda187 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline pipeline XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en_5.4.0_3.0_1718103763180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline_en_5.4.0_3.0_1718103763180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_isaacp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..9a4e58dfc820c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en_5.4.0_3.0_1718114611950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_en_5.4.0_3.0_1718114611950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en.md new file mode 100644 index 00000000000000..78f6f3fa9ef30d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline pipeline XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718114727595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline_en_5.4.0_3.0_1718114727595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_laurentiustancioiu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_en.md new file mode 100644 index 00000000000000..330f7caefe4e3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_leotunganh XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_leotunganh +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_leotunganh` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_en_5.4.0_3.0_1718124605948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_en_5.4.0_3.0_1718124605948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_leotunganh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_leotunganh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_leotunganh| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en.md new file mode 100644 index 00000000000000..aec39ab2a81d74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline pipeline XlmRoBertaForTokenClassification from LeoTungAnh +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline` is a English model originally trained by LeoTungAnh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en_5.4.0_3.0_1718124729135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline_en_5.4.0_3.0_1718124729135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_leotunganh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/LeoTungAnh/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_en.md new file mode 100644 index 00000000000000..a7ae043c90e949 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_malduwais XlmRoBertaForTokenClassification from malduwais +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_malduwais +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_malduwais` is a English model originally trained by malduwais. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_en_5.4.0_3.0_1718125293878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_en_5.4.0_3.0_1718125293878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_malduwais","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_malduwais", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_malduwais| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|836.1 MB| + +## References + +https://huggingface.co/malduwais/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en.md new file mode 100644 index 00000000000000..6e7ee773a5ce25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline pipeline XlmRoBertaForTokenClassification from malduwais +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline` is a English model originally trained by malduwais. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en_5.4.0_3.0_1718125403695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline_en_5.4.0_3.0_1718125403695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_malduwais_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|836.1 MB| + +## References + +https://huggingface.co/malduwais/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_en.md new file mode 100644 index 00000000000000..d3d198fa08bcd9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_maxnet XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_maxnet +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_maxnet` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_en_5.4.0_3.0_1718124012834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_en_5.4.0_3.0_1718124012834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_maxnet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_maxnet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_maxnet| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en.md new file mode 100644 index 00000000000000..2ad0cb98042edf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline pipeline XlmRoBertaForTokenClassification from Maxnet +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline` is a English model originally trained by Maxnet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en_5.4.0_3.0_1718124121656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline_en_5.4.0_3.0_1718124121656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_maxnet_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Maxnet/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_en.md new file mode 100644 index 00000000000000..a42dc8a440ca62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_monkdalma XlmRoBertaForTokenClassification from MonkDalma +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_monkdalma +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_monkdalma` is a English model originally trained by MonkDalma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_en_5.4.0_3.0_1718116073674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_en_5.4.0_3.0_1718116073674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_monkdalma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_monkdalma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_monkdalma| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/MonkDalma/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en.md new file mode 100644 index 00000000000000..ee5478d0732da4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline pipeline XlmRoBertaForTokenClassification from MonkDalma +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline` is a English model originally trained by MonkDalma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en_5.4.0_3.0_1718116183098.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline_en_5.4.0_3.0_1718116183098.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_monkdalma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/MonkDalma/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_en.md new file mode 100644 index 00000000000000..0b81074deb0795 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_msrisrujan XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_msrisrujan +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_msrisrujan` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_en_5.4.0_3.0_1718101574116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_en_5.4.0_3.0_1718101574116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_msrisrujan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_msrisrujan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_msrisrujan| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en.md new file mode 100644 index 00000000000000..f5ef9107e91769 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline pipeline XlmRoBertaForTokenClassification from Msrisrujan +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline` is a English model originally trained by Msrisrujan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en_5.4.0_3.0_1718101683333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline_en_5.4.0_3.0_1718101683333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_msrisrujan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/Msrisrujan/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_en.md new file mode 100644 index 00000000000000..7fa83d9b9775e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_obong XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_obong +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_obong` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_en_5.4.0_3.0_1718117667802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_en_5.4.0_3.0_1718117667802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_obong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_obong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_obong| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en.md new file mode 100644 index 00000000000000..d7d7891734336d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_obong_pipeline pipeline XlmRoBertaForTokenClassification from obong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_obong_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_obong_pipeline` is a English model originally trained by obong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en_5.4.0_3.0_1718117786451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_obong_pipeline_en_5.4.0_3.0_1718117786451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_obong_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_obong_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_obong_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/obong/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_en.md new file mode 100644 index 00000000000000..63096a517d8050 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_patnelt60 XlmRoBertaForTokenClassification from patnelt60 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_patnelt60 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_patnelt60` is a English model originally trained by patnelt60. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_en_5.4.0_3.0_1718133562440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_en_5.4.0_3.0_1718133562440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_patnelt60","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_patnelt60", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_patnelt60| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/patnelt60/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en.md new file mode 100644 index 00000000000000..eebc75b07f692e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline pipeline XlmRoBertaForTokenClassification from patnelt60 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline` is a English model originally trained by patnelt60. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en_5.4.0_3.0_1718133690189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline_en_5.4.0_3.0_1718133690189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_patnelt60_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|815.0 MB| + +## References + +https://huggingface.co/patnelt60/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_en.md new file mode 100644 index 00000000000000..62b7b84bb46466 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_philosucker XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_philosucker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_philosucker` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_en_5.4.0_3.0_1718126516566.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_en_5.4.0_3.0_1718126516566.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_philosucker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_philosucker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_philosucker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|838.8 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en.md new file mode 100644 index 00000000000000..754a3107ef36b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline pipeline XlmRoBertaForTokenClassification from philosucker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline` is a English model originally trained by philosucker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en_5.4.0_3.0_1718126610206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline_en_5.4.0_3.0_1718126610206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_philosucker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|838.8 MB| + +## References + +https://huggingface.co/philosucker/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_en.md new file mode 100644 index 00000000000000..f2b3de49f2d634 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_thkkvui XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_thkkvui +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_thkkvui` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_en_5.4.0_3.0_1718112606952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_en_5.4.0_3.0_1718112606952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_thkkvui","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_thkkvui", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_thkkvui| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en.md new file mode 100644 index 00000000000000..a1b63ce918a170 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline pipeline XlmRoBertaForTokenClassification from thkkvui +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline` is a English model originally trained by thkkvui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en_5.4.0_3.0_1718112715953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline_en_5.4.0_3.0_1718112715953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_thkkvui_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.7 MB| + +## References + +https://huggingface.co/thkkvui/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_en.md new file mode 100644 index 00000000000000..11fe46f7f5fdb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_transformersbook XlmRoBertaForTokenClassification from transformersbook +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_transformersbook +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_transformersbook` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_en_5.4.0_3.0_1718111029428.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_en_5.4.0_3.0_1718111029428.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_transformersbook","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_transformersbook", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_transformersbook| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en.md new file mode 100644 index 00000000000000..87cf5e7b1a4b78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline pipeline XlmRoBertaForTokenClassification from transformersbook +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline` is a English model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en_5.4.0_3.0_1718111148462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline_en_5.4.0_3.0_1718111148462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_transformersbook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_en.md new file mode 100644 index 00000000000000..6a5a4be4c426b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_tyayoi XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_tyayoi +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_tyayoi` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_en_5.4.0_3.0_1718115697730.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_en_5.4.0_3.0_1718115697730.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_tyayoi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_tyayoi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_tyayoi| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en.md new file mode 100644 index 00000000000000..eeca13669651e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline pipeline XlmRoBertaForTokenClassification from tyayoi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline` is a English model originally trained by tyayoi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en_5.4.0_3.0_1718115819616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline_en_5.4.0_3.0_1718115819616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_tyayoi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/tyayoi/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_en.md new file mode 100644 index 00000000000000..d7d0700b88cd35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yezune XlmRoBertaForTokenClassification from yezune +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yezune +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yezune` is a English model originally trained by yezune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_en_5.4.0_3.0_1718118486794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_en_5.4.0_3.0_1718118486794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yezune","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_yezune", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yezune| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yezune/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en.md new file mode 100644 index 00000000000000..2fc8d9441e6691 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_yezune_pipeline pipeline XlmRoBertaForTokenClassification from yezune +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_yezune_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_yezune_pipeline` is a English model originally trained by yezune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en_5.4.0_3.0_1718118606283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_yezune_pipeline_en_5.4.0_3.0_1718118606283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yezune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_italian_yezune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_yezune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/yezune/xlm-roberta-base-finetuned-panx-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en.md new file mode 100644 index 00000000000000..24fea6dbd961e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_the_neural_networker XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_the_neural_networker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_the_neural_networker` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en_5.4.0_3.0_1718103758096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_en_5.4.0_3.0_1718103758096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_the_neural_networker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|837.5 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-ta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en.md new file mode 100644 index 00000000000000..a66907a62c9ec9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline pipeline XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en_5.4.0_3.0_1718103845164.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline_en_5.4.0_3.0_1718103845164.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_tamil_the_neural_networker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|837.5 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-ta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en.md new file mode 100644 index 00000000000000..bec5386a44539f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_telugu_the_neural_networker XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_telugu_the_neural_networker +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_telugu_the_neural_networker` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en_5.4.0_3.0_1718114617598.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_en_5.4.0_3.0_1718114617598.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_telugu_the_neural_networker| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|814.1 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-te \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en.md new file mode 100644 index 00000000000000..0a6ef9815e3b61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline pipeline XlmRoBertaForTokenClassification from the-neural-networker +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline` is a English model originally trained by the-neural-networker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en_5.4.0_3.0_1718114741171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline_en_5.4.0_3.0_1718114741171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_telugu_the_neural_networker_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|814.2 MB| + +## References + +https://huggingface.co/the-neural-networker/xlm-roberta-base-finetuned-panx-te + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_en.md new file mode 100644 index 00000000000000..5c25df11296f4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_pnax_german XlmRoBertaForTokenClassification from Almondpeanuts +author: John Snow Labs +name: xlm_roberta_base_finetuned_pnax_german +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_pnax_german` is a English model originally trained by Almondpeanuts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_en_5.4.0_3.0_1718121928404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_en_5.4.0_3.0_1718121928404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_pnax_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_pnax_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_pnax_german| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Almondpeanuts/xlm-roberta-base-finetuned-pnax-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_pipeline_en.md new file mode 100644 index 00000000000000..c51896bd43cd5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetuned_pnax_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_pnax_german_pipeline pipeline XlmRoBertaForTokenClassification from Almondpeanuts +author: John Snow Labs +name: xlm_roberta_base_finetuned_pnax_german_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_pnax_german_pipeline` is a English model originally trained by Almondpeanuts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_pipeline_en_5.4.0_3.0_1718122039975.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_pnax_german_pipeline_en_5.4.0_3.0_1718122039975.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_pnax_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_pnax_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_pnax_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/Almondpeanuts/xlm-roberta-base-finetuned-pnax-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_en.md new file mode 100644 index 00000000000000..82642c41316e07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetunned_panx_german XlmRoBertaForTokenClassification from jhn9803 +author: John Snow Labs +name: xlm_roberta_base_finetunned_panx_german +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetunned_panx_german` is a English model originally trained by jhn9803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_en_5.4.0_3.0_1718114709492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_en_5.4.0_3.0_1718114709492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetunned_panx_german","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetunned_panx_german", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetunned_panx_german| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jhn9803/xlm-roberta-base-finetunned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_pipeline_en.md new file mode 100644 index 00000000000000..2d1b0d1f496d4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_finetunned_panx_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetunned_panx_german_pipeline pipeline XlmRoBertaForTokenClassification from jhn9803 +author: John Snow Labs +name: xlm_roberta_base_finetunned_panx_german_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetunned_panx_german_pipeline` is a English model originally trained by jhn9803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_pipeline_en_5.4.0_3.0_1718114797628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetunned_panx_german_pipeline_en_5.4.0_3.0_1718114797628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetunned_panx_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetunned_panx_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetunned_panx_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/jhn9803/xlm-roberta-base-finetunned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_pipeline_xx.md new file mode 100644 index 00000000000000..97f6bba3ae03ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlm_roberta_base_ner_silvanus_pipeline pipeline XlmRoBertaForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: xlm_roberta_base_ner_silvanus_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ner_silvanus_pipeline` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_pipeline_xx_5.4.0_3.0_1718097257703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_pipeline_xx_5.4.0_3.0_1718097257703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_ner_silvanus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_ner_silvanus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ner_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|832.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/xlm-roberta-base-ner-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_xx.md new file mode 100644 index 00000000000000..5e5f3f8b609c2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_ner_silvanus_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual xlm_roberta_base_ner_silvanus XlmRoBertaForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: xlm_roberta_base_ner_silvanus +date: 2024-06-11 +tags: [xx, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ner_silvanus` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_xx_5.4.0_3.0_1718097143786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ner_silvanus_xx_5.4.0_3.0_1718097143786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ner_silvanus","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ner_silvanus", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ner_silvanus| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|832.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/xlm-roberta-base-ner-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_en.md new file mode 100644 index 00000000000000..12beccb36a23a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_pie XlmRoBertaForTokenClassification from Gooogr +author: John Snow Labs +name: xlm_roberta_base_pie +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pie` is a English model originally trained by Gooogr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_en_5.4.0_3.0_1718125418567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_en_5.4.0_3.0_1718125418567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pie","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pie", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pie| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.9 MB| + +## References + +https://huggingface.co/Gooogr/xlm-roberta-base-pie \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_pipeline_en.md new file mode 100644 index 00000000000000..0154b675eba48b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pie_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_pie_pipeline pipeline XlmRoBertaForTokenClassification from Gooogr +author: John Snow Labs +name: xlm_roberta_base_pie_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pie_pipeline` is a English model originally trained by Gooogr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_pipeline_en_5.4.0_3.0_1718125512193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pie_pipeline_en_5.4.0_3.0_1718125512193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_pie_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_pie_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pie_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.9 MB| + +## References + +https://huggingface.co/Gooogr/xlm-roberta-base-pie + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_en.md new file mode 100644 index 00000000000000..508ffdb7c541ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_pii_finetuned XlmRoBertaForTokenClassification from 1-13-am +author: John Snow Labs +name: xlm_roberta_base_pii_finetuned +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pii_finetuned` is a English model originally trained by 1-13-am. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_en_5.4.0_3.0_1718100836969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_en_5.4.0_3.0_1718100836969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pii_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_pii_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pii_finetuned| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|819.6 MB| + +## References + +https://huggingface.co/1-13-am/xlm-roberta-base-pii-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..cdc969da76bcd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_pii_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_pii_finetuned_pipeline pipeline XlmRoBertaForTokenClassification from 1-13-am +author: John Snow Labs +name: xlm_roberta_base_pii_finetuned_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pii_finetuned_pipeline` is a English model originally trained by 1-13-am. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_pipeline_en_5.4.0_3.0_1718100991138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pii_finetuned_pipeline_en_5.4.0_3.0_1718100991138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_pii_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_pii_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pii_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|819.6 MB| + +## References + +https://huggingface.co/1-13-am/xlm-roberta-base-pii-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_en.md new file mode 100644 index 00000000000000..130eae93e61e4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_postagging_urdu XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: xlm_roberta_base_postagging_urdu +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_postagging_urdu` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_en_5.4.0_3.0_1718102837887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_en_5.4.0_3.0_1718102837887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_postagging_urdu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_postagging_urdu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_postagging_urdu| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|681.0 MB| + +## References + +https://huggingface.co/Aimlab/xlm-roberta-base-postagging-urdu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_pipeline_en.md new file mode 100644 index 00000000000000..562a0252b39cf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_base_postagging_urdu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_postagging_urdu_pipeline pipeline XlmRoBertaForTokenClassification from Aimlab +author: John Snow Labs +name: xlm_roberta_base_postagging_urdu_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_postagging_urdu_pipeline` is a English model originally trained by Aimlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_pipeline_en_5.4.0_3.0_1718103064313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_postagging_urdu_pipeline_en_5.4.0_3.0_1718103064313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_postagging_urdu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_postagging_urdu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_postagging_urdu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|681.0 MB| + +## References + +https://huggingface.co/Aimlab/xlm-roberta-base-postagging-urdu + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_en.md new file mode 100644 index 00000000000000..1d42962ef6cbd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_panx_uzbek XlmRoBertaForTokenClassification from murodbek +author: John Snow Labs +name: xlm_roberta_panx_uzbek +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_panx_uzbek` is a English model originally trained by murodbek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_en_5.4.0_3.0_1718134383603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_en_5.4.0_3.0_1718134383603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_panx_uzbek","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_panx_uzbek", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_panx_uzbek| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|836.7 MB| + +## References + +https://huggingface.co/murodbek/xlm-roberta-panx-uz \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_pipeline_en.md new file mode 100644 index 00000000000000..0305698493ccc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlm_roberta_panx_uzbek_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_panx_uzbek_pipeline pipeline XlmRoBertaForTokenClassification from murodbek +author: John Snow Labs +name: xlm_roberta_panx_uzbek_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_panx_uzbek_pipeline` is a English model originally trained by murodbek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_pipeline_en_5.4.0_3.0_1718134471717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_panx_uzbek_pipeline_en_5.4.0_3.0_1718134471717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_panx_uzbek_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_panx_uzbek_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_panx_uzbek_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|836.7 MB| + +## References + +https://huggingface.co/murodbek/xlm-roberta-panx-uz + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_en.md new file mode 100644 index 00000000000000..68b964d19458a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_base_finetuned_hausa_2e_4 XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: xlmr_base_finetuned_hausa_2e_4 +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_base_finetuned_hausa_2e_4` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_en_5.4.0_3.0_1718135056261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_en_5.4.0_3.0_1718135056261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_base_finetuned_hausa_2e_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_base_finetuned_hausa_2e_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_base_finetuned_hausa_2e_4| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|868.5 MB| + +## References + +https://huggingface.co/grace-pro/xlmr-base-finetuned-hausa-2e-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_pipeline_en.md new file mode 100644 index 00000000000000..3da738b22d0912 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_base_finetuned_hausa_2e_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_base_finetuned_hausa_2e_4_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: xlmr_base_finetuned_hausa_2e_4_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_base_finetuned_hausa_2e_4_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718135160923.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_base_finetuned_hausa_2e_4_pipeline_en_5.4.0_3.0_1718135160923.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_base_finetuned_hausa_2e_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_base_finetuned_hausa_2e_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_base_finetuned_hausa_2e_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|868.5 MB| + +## References + +https://huggingface.co/grace-pro/xlmr-base-finetuned-hausa-2e-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_en.md new file mode 100644 index 00000000000000..b715c4750b0e74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_medical XlmRoBertaForTokenClassification from aaaksenova +author: John Snow Labs +name: xlmr_medical +date: 2024-06-11 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_medical` is a English model originally trained by aaaksenova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_medical_en_5.4.0_3.0_1718099301347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_medical_en_5.4.0_3.0_1718099301347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_medical","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmr_medical", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_medical| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/aaaksenova/xlmr_medical \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_pipeline_en.md new file mode 100644 index 00000000000000..d6eec4297a3ed1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmr_medical_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmr_medical_pipeline pipeline XlmRoBertaForTokenClassification from aaaksenova +author: John Snow Labs +name: xlmr_medical_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_medical_pipeline` is a English model originally trained by aaaksenova. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_medical_pipeline_en_5.4.0_3.0_1718099367588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_medical_pipeline_en_5.4.0_3.0_1718099367588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmr_medical_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmr_medical_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_medical_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/aaaksenova/xlmr_medical + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..d29d4f330d1e5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from ArneD) +author: John Snow Labs +name: xlmroberta_ner_arned_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `ArneD`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_de_5.4.0_3.0_1718071934527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_de_5.4.0_3.0_1718071934527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_arned_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_arned_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_ArneD").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_arned_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/ArneD/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..7f7ac64f64892e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_arned_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_arned_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from ArneD +author: John Snow Labs +name: xlmroberta_ner_arned_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_arned_base_finetuned_panx_pipeline` is a German model originally trained by ArneD. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718072021578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_arned_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718072021578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_arned_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_arned_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_arned_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/ArneD/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_en.md new file mode 100644 index 00000000000000..8d746630f1093e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from tner) +author: John Snow Labs +name: xlmroberta_ner_base_bc5cdr +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-bc5cdr` is a English model originally trained by `tner`. + +## Predicted Entities + +`chemical`, `disease` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_en_5.4.0_3.0_1718072396890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_en_5.4.0_3.0_1718072396890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bc5cdr","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bc5cdr","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.bc5cdr.base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_bc5cdr| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|779.7 MB| + +## References + +References + +- https://huggingface.co/tner/xlm-roberta-base-bc5cdr +- https://github.com/asahi417/tner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_pipeline_en.md new file mode 100644 index 00000000000000..3e2f54c300ef6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_bc5cdr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_base_bc5cdr_pipeline pipeline XlmRoBertaForTokenClassification from tner +author: John Snow Labs +name: xlmroberta_ner_base_bc5cdr_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_bc5cdr_pipeline` is a English model originally trained by tner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_pipeline_en_5.4.0_3.0_1718072584099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bc5cdr_pipeline_en_5.4.0_3.0_1718072584099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_bc5cdr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_bc5cdr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_bc5cdr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|779.7 MB| + +## References + +https://huggingface.co/tner/xlm-roberta-base-bc5cdr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..73bdfa8278ed4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072130572.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072130572.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-amharic-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..738effba8f4942 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-amharic-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `LOC`, `ORG`, `DATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw_5.4.0_3.0_1718072036241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili_sw_5.4.0_3.0_1718072036241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_amharic_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-amharic-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..04ecbf84774610 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072135649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072135649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luo-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..99387e381f7e79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw_5.4.0_3.0_1718072026461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili_sw_5.4.0_3.0_1718072026461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili","sw") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili", "sw") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_dholuo_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luo-finetuned-ner-swahili \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..904e23c78b218f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072733391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072733391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luganda-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..7478afe2726f49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-luganda-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `DATE`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw_5.4.0_3.0_1718072658418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili_sw_5.4.0_3.0_1718072658418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_luganda.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_luganda_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-luganda-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..2a89ccf46bc217 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072732776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072732776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-naija-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..391e681e214f99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-naija-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `DATE`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw_5.4.0_3.0_1718072656266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili_sw_5.4.0_3.0_1718072656266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_naija.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_naija_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-naija-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_en.md new file mode 100644 index 00000000000000..b7d6316b89b058 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_en.md @@ -0,0 +1,117 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from edwardjross) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_recipe_all +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-recipe-all` is a English model originally trained by `edwardjross`. + +## Predicted Entities + +`UNIT`, `DF`, `QUANTITY`, `TEMP`, `SIZE`, `NAME`, `STATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_en_5.4.0_3.0_1718093425591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_en_5.4.0_3.0_1718093425591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_recipe_all","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_recipe_all","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_recipe_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|834.4 MB| + +## References + +References + +- https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-recipe-all +- https://github.com/cosylabiiit/recipe-knowledge-mining +- https://arxiv.org/abs/2004.12184 +- https://github.com/cosylabiiit/recipe-knowledge-mining +- https://www.oreilly.com/library/view/natural-language-processing/9781098103231/ +- https://github.com/EdwardJRoss/nlp_transformers_exercises/blob/master/notebooks/ch4-ner-recipe-stanford-crf.ipynb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_pipeline_en.md new file mode 100644 index 00000000000000..f728205ce2ee39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_recipe_all_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_base_finetuned_recipe_all_pipeline pipeline XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_recipe_all_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_recipe_all_pipeline` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_pipeline_en_5.4.0_3.0_1718093515879.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_recipe_all_pipeline_en_5.4.0_3.0_1718093515879.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_recipe_all_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_recipe_all_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_recipe_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|834.4 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-recipe-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm.md new file mode 100644 index 00000000000000..742881afa86855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm.md @@ -0,0 +1,120 @@ +--- +layout: model +title: Nigerian Pidgin XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_naija +date: 2024-06-11 +tags: [pcm, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: pcm +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-naija` is a Nigerian Pidgin model originally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `LOC`, `DATE`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm_5.4.0_3.0_1718072035316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pcm_5.4.0_3.0_1718072035316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_naija","pcm") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_naija","pcm") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("pcm.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_naija| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|pcm| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-naija +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://www.apache.org/licenses/LICENSE-2.0 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner +- https://arxiv.org/pdf/2103.11811.pdf +- https://arxiv.org/abs/2103.11811 +- https://arxiv.org/abs/2103.11811 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm.md new file mode 100644 index 00000000000000..c8c98075671bfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Nigerian Pidgin xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline +date: 2024-06-11 +tags: [pcm, open_source, pipeline, onnx] +task: Named Entity Recognition +language: pcm +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline` is a Nigerian Pidgin model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm_5.4.0_3.0_1718072108386.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline_pcm_5.4.0_3.0_1718072108386.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline", lang = "pcm") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline", lang = "pcm") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_naija_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pcm| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-naija + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw.md new file mode 100644 index 00000000000000..2c2d684d1097be --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Kinyarwanda xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline +date: 2024-06-11 +tags: [rw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: rw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline` is a Kinyarwanda model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw_5.4.0_3.0_1718072120242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline_rw_5.4.0_3.0_1718072120242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline", lang = "rw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline", lang = "rw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|rw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-kinyarwanda + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw.md new file mode 100644 index 00000000000000..8c2178b7bf263d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Kinyarwanda XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand +date: 2024-06-11 +tags: [rw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: rw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-kinyarwanda` is a Kinyarwanda model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718072039614.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718072039614.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand","rw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand","rw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("rw.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_kinyarwand| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|rw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-kinyarwanda +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo.md new file mode 100644 index 00000000000000..02d165c02ef2ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Luo (Kenya and Tanzania) XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner +date: 2024-06-11 +tags: [luo, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: luo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-luo` is a Luo (Kenya and Tanzania) model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo_5.4.0_3.0_1718072557607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_luo_5.4.0_3.0_1718072557607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner","luo") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner","luo") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("luo.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|luo| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-luo +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo.md new file mode 100644 index 00000000000000..e9142fc3b4377a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dholuo, Luo (Kenya and Tanzania) xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline +date: 2024-06-11 +tags: [luo, open_source, pipeline, onnx] +task: Named Entity Recognition +language: luo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline` is a Dholuo, Luo (Kenya and Tanzania) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo_5.4.0_3.0_1718072623410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline_luo_5.4.0_3.0_1718072623410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline", lang = "luo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline", lang = "luo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|luo| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-luo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo.md new file mode 100644 index 00000000000000..b52a026c19a540 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Wolof xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline +date: 2024-06-11 +tags: [wo, open_source, pipeline, onnx] +task: Named Entity Recognition +language: wo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline` is a Wolof model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo_5.4.0_3.0_1718093466874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline_wo_5.4.0_3.0_1718093466874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline", lang = "wo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline", lang = "wo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|wo| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-wolof + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo.md new file mode 100644 index 00000000000000..277d50be428ba4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Wolof XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof +date: 2024-06-11 +tags: [wo, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: wo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-swahili-finetuned-ner-wolof` is a Wolof model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo_5.4.0_3.0_1718093401811.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof_wo_5.4.0_3.0_1718093401811.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof","wo") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof","wo") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("wo.ner.xlmr_roberta.base_finetuned_swahili.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_swahili_finetuned_ner_wolof| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|wo| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-swahili-finetuned-ner-wolof +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..da49e9b863f0f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072696880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718072696880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-wolof-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..76a70c39f0fed8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-wolof-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw_5.4.0_3.0_1718072618924.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili_sw_5.4.0_3.0_1718072618924.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_wolof.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_wolof_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-wolof-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw.md new file mode 100644 index 00000000000000..90a464e3475e2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Swahili (macrolanguage) xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline +date: 2024-06-11 +tags: [sw, open_source, pipeline, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline` is a Swahili (macrolanguage) model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718093797916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline_sw_5.4.0_3.0_1718093797916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline", lang = "sw") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline", lang = "sw") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sw| +|Size:|1.0 GB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-yoruba-finetuned-ner-swahili + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw.md new file mode 100644 index 00000000000000..57a5df026e4ba9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Swahili XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili +date: 2024-06-11 +tags: [sw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: sw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-yoruba-finetuned-ner-swahili` is a Swahili model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw_5.4.0_3.0_1718093731937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili_sw_5.4.0_3.0_1718093731937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili","sw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili","sw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("sw.ner.xlmr_roberta.base_finetuned_yoruba.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_yoruba_finetuned_ner_swahili| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|sw| +|Size:|1.0 GB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-yoruba-finetuned-ner-swahili +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_en.md new file mode 100644 index 00000000000000..867522a8579c99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Uncased model (from tner) +author: John Snow Labs +name: xlmroberta_ner_base_uncased_all_english +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-uncased-all-english` is a English model originally trained by `tner`. + +## Predicted Entities + +`actor`, `time`, `corporation`, `ordinal number`, `cardinal number`, `restaurant`, `director`, `rna`, `geopolitical area`, `rating`, `protein`, `percent`, `product`, `plot`, `dna`, `disease`, `cell line`, `law`, `other`, `quote`, `date`, `soundtrack`, `origin`, `amenity`, `chemical`, `event`, `cuisine`, `dish`, `work of art`, `genre`, `cell type`, `location`, `language`, `quantity`, `award`, `character name`, `facility`, `relationship`, `organization`, `opinion`, `group`, `money`, `person` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_en_5.4.0_3.0_1718093500047.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_en_5.4.0_3.0_1718093500047.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_uncased_all_english","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_uncased_all_english","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.all_english.uncased_base.by_tner").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_uncased_all_english| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|803.9 MB| + +## References + +References + +- https://huggingface.co/tner/xlm-roberta-base-uncased-all-english +- https://github.com/asahi417/tner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_pipeline_en.md new file mode 100644 index 00000000000000..89dad2bcd7cadf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_base_uncased_all_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_base_uncased_all_english_pipeline pipeline XlmRoBertaForTokenClassification from tner +author: John Snow Labs +name: xlmroberta_ner_base_uncased_all_english_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_uncased_all_english_pipeline` is a English model originally trained by tner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_pipeline_en_5.4.0_3.0_1718093669223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_uncased_all_english_pipeline_en_5.4.0_3.0_1718093669223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_uncased_all_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_uncased_all_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_uncased_all_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|803.9 MB| + +## References + +https://huggingface.co/tner/xlm-roberta-base-uncased-all-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx.md new file mode 100644 index 00000000000000..e5817bdbfc15e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline pipeline XlmRoBertaForTokenClassification from cj-mills +author: John Snow Labs +name: xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline` is a Multilingual model originally trained by cj-mills. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718094020656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718094020656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_cj_mills_base_finetuned_panx_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|859.8 MB| + +## References + +https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx.md new file mode 100644 index 00000000000000..fc5824794545a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Multilingual XLMRobertaForTokenClassification Base Cased model (from cj-mills) +author: John Snow Labs +name: xlmroberta_ner_cj_mills_base_finetuned_panx_all +date: 2024-06-11 +tags: [xx, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-all` is a Multilingual model originally trained by `cj-mills`. + +## Predicted Entities + +`ORG`, `LOC`, `PER` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx_5.4.0_3.0_1718093933639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_cj_mills_base_finetuned_panx_all_xx_5.4.0_3.0_1718093933639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_cj_mills_base_finetuned_panx_all","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_cj_mills_base_finetuned_panx_all","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("xx.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_cj_mills_base_finetuned_panx_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|859.8 MB| + +## References + +References + +- https://huggingface.co/cj-mills/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..e7b02ecfbb3fb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from dfsj) +author: John Snow Labs +name: xlmroberta_ner_dfsj_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `dfsj`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_de_5.4.0_3.0_1718093332007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_de_5.4.0_3.0_1718093332007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dfsj_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dfsj_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_dfsj").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dfsj_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/dfsj/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..2eac378fe59e89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_dfsj_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from dfsj +author: John Snow Labs +name: xlmroberta_ner_dfsj_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_dfsj_base_finetuned_panx_pipeline` is a German model originally trained by dfsj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093418856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dfsj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093418856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_dfsj_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_dfsj_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dfsj_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/dfsj/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..a30233ee8013eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from dkasti) +author: John Snow Labs +name: xlmroberta_ner_dkasti_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `dkasti`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_de_5.4.0_3.0_1718093695826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_de_5.4.0_3.0_1718093695826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dkasti_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_dkasti_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_dkasti").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dkasti_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..1e7e6ad17330ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_dkasti_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from dkasti +author: John Snow Labs +name: xlmroberta_ner_dkasti_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_dkasti_base_finetuned_panx_pipeline` is a German model originally trained by dkasti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093782823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_dkasti_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718093782823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_dkasti_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_dkasti_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_dkasti_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/dkasti/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..be3b24e55f6e26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from furyhawk) +author: John Snow Labs +name: xlmroberta_ner_furyhawk_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `furyhawk`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_de_5.4.0_3.0_1718094653555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_de_5.4.0_3.0_1718094653555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_furyhawk_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_furyhawk_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_furyhawk").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_furyhawk_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/furyhawk/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..e10461957cce32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from furyhawk +author: John Snow Labs +name: xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline` is a German model originally trained by furyhawk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094742018.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094742018.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_furyhawk_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/furyhawk/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..81e204a34817ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from harish3110) +author: John Snow Labs +name: xlmroberta_ner_harish3110_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `harish3110`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_de_5.4.0_3.0_1718094551313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_de_5.4.0_3.0_1718094551313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_harish3110_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_harish3110_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_harish3110").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_harish3110_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/harish3110/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..7a7d89e41ca0b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_harish3110_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from harish3110 +author: John Snow Labs +name: xlmroberta_ner_harish3110_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_harish3110_base_finetuned_panx_pipeline` is a German model originally trained by harish3110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094647055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_harish3110_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094647055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_harish3110_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_harish3110_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_harish3110_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/harish3110/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..6d81ae2b7f9a46 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from KayKozaronek) +author: John Snow Labs +name: xlmroberta_ner_kaykozaronek_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `KayKozaronek`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_de_5.4.0_3.0_1718094928242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_de_5.4.0_3.0_1718094928242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_kaykozaronek_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_kaykozaronek_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_KayKozaronek").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_kaykozaronek_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/KayKozaronek/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..db44f2601f4890 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from KayKozaronek +author: John Snow Labs +name: xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline` is a German model originally trained by KayKozaronek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095014624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095014624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_kaykozaronek_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/KayKozaronek/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..09e7357963249d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from miyagawaorj) +author: John Snow Labs +name: xlmroberta_ner_miyagawaorj_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `miyagawaorj`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_de_5.4.0_3.0_1718094814679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_de_5.4.0_3.0_1718094814679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_miyagawaorj_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_miyagawaorj_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_miyagawaorj").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_miyagawaorj_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/miyagawaorj/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..48a88ee030274f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from miyagawaorj +author: John Snow Labs +name: xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline` is a German model originally trained by miyagawaorj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094901754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718094901754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_miyagawaorj_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/miyagawaorj/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_en.md new file mode 100644 index 00000000000000..146f60829330b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_en.md @@ -0,0 +1,112 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from Neha2608) +author: John Snow Labs +name: xlmroberta_ner_neha2608_base_finetuned_panx +date: 2024-06-11 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-en` is a English model originally trained by `Neha2608`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_en_5.4.0_3.0_1718095594405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_en_5.4.0_3.0_1718095594405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_neha2608_base_finetuned_panx","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_neha2608_base_finetuned_panx","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.xtreme.base_finetuned.by_Neha2608").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_neha2608_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|825.6 MB| + +## References + +References + +- https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en.md new file mode 100644 index 00000000000000..7aaa6034c99d49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_ner_neha2608_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from Neha2608 +author: John Snow Labs +name: xlmroberta_ner_neha2608_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_neha2608_base_finetuned_panx_pipeline` is a English model originally trained by Neha2608. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en_5.4.0_3.0_1718095718736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_neha2608_base_finetuned_panx_pipeline_en_5.4.0_3.0_1718095718736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_neha2608_base_finetuned_panx_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_neha2608_base_finetuned_panx_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_neha2608_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|825.6 MB| + +## References + +https://huggingface.co/Neha2608/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..7a94a94549ea96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from novarac23) +author: John Snow Labs +name: xlmroberta_ner_novarac23_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `novarac23`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_de_5.4.0_3.0_1718096022977.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_de_5.4.0_3.0_1718096022977.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_novarac23_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_novarac23_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_novarac23").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_novarac23_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/novarac23/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..32cf17d69683b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_novarac23_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from novarac23 +author: John Snow Labs +name: xlmroberta_ner_novarac23_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_novarac23_base_finetuned_panx_pipeline` is a German model originally trained by novarac23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096108829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_novarac23_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096108829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_novarac23_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_novarac23_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_novarac23_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/novarac23/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..ba4ac333efba67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from pdroberts) +author: John Snow Labs +name: xlmroberta_ner_pdroberts_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `pdroberts`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_de_5.4.0_3.0_1718095555716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_de_5.4.0_3.0_1718095555716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_pdroberts_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_pdroberts_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_pdroberts").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_pdroberts_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/pdroberts/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..497614a9404610 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from pdroberts +author: John Snow Labs +name: xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline` is a German model originally trained by pdroberts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095659548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095659548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_pdroberts_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/pdroberts/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..4e259830f95ee0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from rishiyoung) +author: John Snow Labs +name: xlmroberta_ner_rishiyoung_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `rishiyoung`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_de_5.4.0_3.0_1718095992221.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_de_5.4.0_3.0_1718095992221.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_rishiyoung_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_rishiyoung_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_rishiyoung").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_rishiyoung_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/rishiyoung/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..8248a5f7212722 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from rishiyoung +author: John Snow Labs +name: xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline` is a German model originally trained by rishiyoung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096079724.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096079724.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_rishiyoung_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/rishiyoung/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx.md new file mode 100644 index 00000000000000..7060f8844f3ca8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline pipeline XlmRoBertaForTokenClassification from robkayinto +author: John Snow Labs +name: xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline` is a Multilingual model originally trained by robkayinto. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095764653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095764653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_robkayinto_base_finetuned_panx_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|861.0 MB| + +## References + +https://huggingface.co/robkayinto/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx.md new file mode 100644 index 00000000000000..4ea506dfac5ad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Multilingual XLMRobertaForTokenClassification Base Cased model (from robkayinto) +author: John Snow Labs +name: xlmroberta_ner_robkayinto_base_finetuned_panx_all +date: 2024-06-11 +tags: [xx, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-all` is a Multilingual model originally trained by `robkayinto`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx_5.4.0_3.0_1718095681806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_robkayinto_base_finetuned_panx_all_xx_5.4.0_3.0_1718095681806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_robkayinto_base_finetuned_panx_all","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_robkayinto_base_finetuned_panx_all","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("xx.ner.xlmr_roberta.base_finetuned_panx_all.by_robkayinto").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_robkayinto_base_finetuned_panx_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|861.0 MB| + +## References + +References + +- https://huggingface.co/robkayinto/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..183d6b762dde6f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from SimulSt) +author: John Snow Labs +name: xlmroberta_ner_simulst_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `SimulSt`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_de_5.4.0_3.0_1718095557497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_de_5.4.0_3.0_1718095557497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_simulst_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_simulst_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_SimulSt").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_simulst_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/SimulSt/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..d77b912caa4a50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_simulst_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from SimulSt +author: John Snow Labs +name: xlmroberta_ner_simulst_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_simulst_base_finetuned_panx_pipeline` is a German model originally trained by SimulSt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095656737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_simulst_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718095656737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_simulst_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_simulst_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_simulst_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/SimulSt/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx.md new file mode 100644 index 00000000000000..fb4469d2c6590f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline pipeline XlmRoBertaForTokenClassification from transformersbook +author: John Snow Labs +name: xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline +date: 2024-06-11 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline` is a Multilingual model originally trained by transformersbook. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095833355.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline_xx_5.4.0_3.0_1718095833355.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_transformersbook_base_finetuned_panx_all_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|861.0 MB| + +## References + +https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx.md new file mode 100644 index 00000000000000..9d5eaded52605e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Multilingual XLMRobertaForTokenClassification Base Cased model (from transformersbook) +author: John Snow Labs +name: xlmroberta_ner_transformersbook_base_finetuned_panx_all +date: 2024-06-11 +tags: [xx, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-all` is a Multilingual model originally trained by `transformersbook`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx_5.4.0_3.0_1718095750213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_transformersbook_base_finetuned_panx_all_xx_5.4.0_3.0_1718095750213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_transformersbook_base_finetuned_panx_all","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_transformersbook_base_finetuned_panx_all","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("xx.ner.xlmr_roberta.wikiann.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_transformersbook_base_finetuned_panx_all| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|861.0 MB| + +## References + +References + +- https://huggingface.co/transformersbook/xlm-roberta-base-finetuned-panx-all +- https://learning.oreilly.com/library/view/natural-language-processing/9781098103231/ +- https://github.com/nlp-with-transformers/notebooks/blob/main/04_multilingual-ner.ipynb +- https://paperswithcode.com/sota?task=Token+Classification&dataset=wikiann \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_de.md new file mode 100644 index 00000000000000..3f70bb46ae1f48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_de.md @@ -0,0 +1,113 @@ +--- +layout: model +title: German XLMRobertaForTokenClassification Base Cased model (from xliu128) +author: John Snow Labs +name: xlmroberta_ner_xliu128_base_finetuned_panx +date: 2024-06-11 +tags: [de, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-de` is a German model originally trained by `xliu128`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_de_5.4.0_3.0_1718096617019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_de_5.4.0_3.0_1718096617019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xliu128_base_finetuned_panx","de") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xliu128_base_finetuned_panx","de") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("de.ner.xlmr_roberta.xtreme.base_finetuned.by_xliu128").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xliu128_base_finetuned_panx| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|853.8 MB| + +## References + +References + +- https://huggingface.co/xliu128/xlm-roberta-base-finetuned-panx-de +- https://paperswithcode.com/sota?task=Token+Classification&dataset=xtreme \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de.md new file mode 100644 index 00000000000000..337ab6cdc6eec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German xlmroberta_ner_xliu128_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from xliu128 +author: John Snow Labs +name: xlmroberta_ner_xliu128_base_finetuned_panx_pipeline +date: 2024-06-11 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_xliu128_base_finetuned_panx_pipeline` is a German model originally trained by xliu128. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096703642.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xliu128_base_finetuned_panx_pipeline_de_5.4.0_3.0_1718096703642.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_xliu128_base_finetuned_panx_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_xliu128_base_finetuned_panx_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xliu128_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|853.8 MB| + +## References + +https://huggingface.co/xliu128/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha.md new file mode 100644 index 00000000000000..7f004522828d77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha.md @@ -0,0 +1,117 @@ +--- +layout: model +title: Hausa Named Entity Recognition (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa +date: 2024-06-11 +tags: [xlm_roberta, ner, token_classification, ha, open_source, onnx] +task: Named Entity Recognition +language: ha +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-hausa` is a Hausa model orginally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `ORG`, `LOC`, `DATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha_5.4.0_3.0_1718097060825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_ha_5.4.0_3.0_1718097060825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa","ha") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("pos") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Ina son Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa","ha") + .setInputCols(Array("sentence", "token")) + .setOutputCol("pos") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Ina son Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("ha.ner.xlmr_roberta.base_finetuned_hausa.by_mbeukman").predict("""Ina son Spark NLP""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ha| +|Size:|774.7 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-hausa +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://www.apache.org/licenses/LICENSE-2.0 +- https://github.com/Michael-Beukman/NERTransfer +- htt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha.md new file mode 100644 index 00000000000000..0c834d88f02f89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hausa xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline +date: 2024-06-11 +tags: [ha, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ha +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline` is a Hausa model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha_5.4.0_3.0_1718097242413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline_ha_5.4.0_3.0_1718097242413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline", lang = "ha") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline", lang = "ha") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ha| +|Size:|774.7 MB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo.md new file mode 100644 index 00000000000000..66de4b4ae4d697 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Yoruba xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline pipeline XlmRoBertaForTokenClassification from mbeukman +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline +date: 2024-06-11 +tags: [yo, open_source, pipeline, onnx] +task: Named Entity Recognition +language: yo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline` is a Yoruba model originally trained by mbeukman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo_5.4.0_3.0_1718096869540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline_yo_5.4.0_3.0_1718096869540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline", lang = "yo") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline", lang = "yo") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|yo| +|Size:|772.8 MB| + +## References + +https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-yoruba + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo.md b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo.md new file mode 100644 index 00000000000000..b1192123208396 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-11-xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo.md @@ -0,0 +1,111 @@ +--- +layout: model +title: Yoruba Named Entity Recognition (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba +date: 2024-06-11 +tags: [xlm_roberta, ner, token_classification, yo, open_source, onnx] +task: Named Entity Recognition +language: yo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Named Entity Recognition model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-yoruba` is a Yoruba model orginally trained by `mbeukman`. + +## Predicted Entities + +`PER`, `ORG`, `LOC`, `DATE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo_5.4.0_3.0_1718096687333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba_yo_5.4.0_3.0_1718096687333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba","yo") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["Mo nifẹ Snark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba","yo") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("Mo nifẹ Snark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_xlm_roberta_base_finetuned_ner_yoruba| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|yo| +|Size:|772.8 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-yoruba +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://www.apache.org/licenses/LICENSE-2.0 +- https://github.com/Michael-Beukman/NERTransfer +- ht \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-12-sent_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-06-12-sent_roberta_base_en.md new file mode 100644 index 00000000000000..61565f34a07c3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-12-sent_roberta_base_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: RoBERTa Base Sentence Embeddings(sent_roberta_base) +author: John Snow Labs +name: sent_roberta_base +date: 2024-06-12 +tags: [sentence_embeddings, en, english, roberta, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.4 +supported: true +engine: onnx +annotator: RoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained model on English language using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository. This model is case-sensitive: it makes a difference between english and English. + +RoBERTa is a transformers model pretrained on a large corpus of English data in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labeling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and labels from those texts. + +More precisely, it was pretrained with the Masked language modeling (MLM) objective. Taking a sentence, the model randomly masks 15% of the words in the input then runs the entire masked sentence through the model and has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT which internally mask the future tokens. It allows the model to learn a bidirectional representation of the sentence. + +This way, the model learns an inner representation of the English language that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences, for instance, you can train a standard classifier using the features produced by the BERT model as inputs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_roberta_base_en_5.4.0_3.4_1718213024958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_roberta_base_en_5.4.0_3.4_1718213024958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = RoBertaSentenceEmbeddings.pretrained("sent_roberta_base", "en") \ + .setInputCols("sentence") \ + .setOutputCol("embeddings") +``` +```scala +val embeddings = RoBertaSentenceEmbeddings.pretrained("sent_roberta_base", "en") + .setInputCols("sentence") + .setOutputCol("embeddings") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_roberta_base| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[sentence_embeddings]| +|Language:|en| +|Size:|297.8 MB| + +## References + +https://huggingface.co/FacebookAI/roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_en.md new file mode 100644 index 00000000000000..2d5095ff91165a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_english_sec10k_embed BGEEmbeddings from pavanmantha +author: John Snow Labs +name: bge_base_english_sec10k_embed +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_sec10k_embed` is a English model originally trained by pavanmantha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_en_5.4.0_3.0_1718289495528.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_en_5.4.0_3.0_1718289495528.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_english_sec10k_embed","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_english_sec10k_embed","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_sec10k_embed| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/pavanmantha/bge-base-en-sec10k-embed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_pipeline_en.md new file mode 100644 index 00000000000000..67d6f050d959fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_english_sec10k_embed_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_english_sec10k_embed_pipeline pipeline BGEEmbeddings from pavanmantha +author: John Snow Labs +name: bge_base_english_sec10k_embed_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_english_sec10k_embed_pipeline` is a English model originally trained by pavanmantha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_pipeline_en_5.4.0_3.0_1718289529223.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_english_sec10k_embed_pipeline_en_5.4.0_3.0_1718289529223.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_english_sec10k_embed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_english_sec10k_embed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_english_sec10k_embed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/pavanmantha/bge-base-en-sec10k-embed + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_en.md new file mode 100644 index 00000000000000..d728c3137729ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_anikulkar BGEEmbeddings from anikulkar +author: John Snow Labs +name: bge_base_financial_matryoshka_anikulkar +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_anikulkar` is a English model originally trained by anikulkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_en_5.4.0_3.0_1718289693625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_en_5.4.0_3.0_1718289693625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_anikulkar","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_anikulkar","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_anikulkar| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/anikulkar/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_pipeline_en.md new file mode 100644 index 00000000000000..5b6a4ede49033f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_anikulkar_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_anikulkar_pipeline pipeline BGEEmbeddings from anikulkar +author: John Snow Labs +name: bge_base_financial_matryoshka_anikulkar_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_anikulkar_pipeline` is a English model originally trained by anikulkar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_pipeline_en_5.4.0_3.0_1718289728318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_anikulkar_pipeline_en_5.4.0_3.0_1718289728318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_anikulkar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_anikulkar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_anikulkar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/anikulkar/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_en.md new file mode 100644 index 00000000000000..977e8a487ff50e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_hritikmore BGEEmbeddings from Hritikmore +author: John Snow Labs +name: bge_base_financial_matryoshka_hritikmore +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_hritikmore` is a English model originally trained by Hritikmore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_en_5.4.0_3.0_1718290095984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_en_5.4.0_3.0_1718290095984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_hritikmore","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_hritikmore","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_hritikmore| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Hritikmore/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_pipeline_en.md new file mode 100644 index 00000000000000..d726b895f09fb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_hritikmore_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_hritikmore_pipeline pipeline BGEEmbeddings from Hritikmore +author: John Snow Labs +name: bge_base_financial_matryoshka_hritikmore_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_hritikmore_pipeline` is a English model originally trained by Hritikmore. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_pipeline_en_5.4.0_3.0_1718290130520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_hritikmore_pipeline_en_5.4.0_3.0_1718290130520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_hritikmore_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_hritikmore_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_hritikmore_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/Hritikmore/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_en.md new file mode 100644 index 00000000000000..de37420e20f1b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_thetayne BGEEmbeddings from thetayne +author: John Snow Labs +name: bge_base_financial_matryoshka_thetayne +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_thetayne` is a English model originally trained by thetayne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_en_5.4.0_3.0_1718290300674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_en_5.4.0_3.0_1718290300674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_thetayne","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka_thetayne","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_thetayne| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/thetayne/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_pipeline_en.md new file mode 100644 index 00000000000000..9475b0fd716f4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_financial_matryoshka_thetayne_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_thetayne_pipeline pipeline BGEEmbeddings from thetayne +author: John Snow Labs +name: bge_base_financial_matryoshka_thetayne_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_thetayne_pipeline` is a English model originally trained by thetayne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_pipeline_en_5.4.0_3.0_1718290335477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_thetayne_pipeline_en_5.4.0_3.0_1718290335477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_thetayne_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_thetayne_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_thetayne_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/thetayne/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_en.md new file mode 100644 index 00000000000000..5dd5caa03e6f1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v7 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v7 +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v7` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_en_5.4.0_3.0_1718289608530.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_en_5.4.0_3.0_1718289608530.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v7","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v7","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v7| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|381.5 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_pipeline_en.md new file mode 100644 index 00000000000000..dafe2e0d83dea4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v7_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v7_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v7_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v7_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_pipeline_en_5.4.0_3.0_1718289645988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v7_pipeline_en_5.4.0_3.0_1718289645988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.6 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v7 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_en.md new file mode 100644 index 00000000000000..75250ec6d622da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v8 BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v8 +date: 2024-06-13 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v8` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_en_5.4.0_3.0_1718289899891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_en_5.4.0_3.0_1718289899891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v8","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_securiti_dataset_1_v8","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v8| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|382.1 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_pipeline_en.md new file mode 100644 index 00000000000000..0c5f37f1adad55 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-bge_base_securiti_dataset_1_v8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_securiti_dataset_1_v8_pipeline pipeline BGEEmbeddings from MugheesAwan11 +author: John Snow Labs +name: bge_base_securiti_dataset_1_v8_pipeline +date: 2024-06-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_securiti_dataset_1_v8_pipeline` is a English model originally trained by MugheesAwan11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_pipeline_en_5.4.0_3.0_1718289937694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_securiti_dataset_1_v8_pipeline_en_5.4.0_3.0_1718289937694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_securiti_dataset_1_v8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_securiti_dataset_1_v8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_securiti_dataset_1_v8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.1 MB| + +## References + +https://huggingface.co/MugheesAwan11/bge-base-securiti-dataset-1-v8 + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_bionlp2004_en.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_bionlp2004_en.md new file mode 100644 index 00000000000000..8e370e593c436a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_bionlp2004_en.md @@ -0,0 +1,113 @@ +--- +layout: model +title: English XLMRobertaForTokenClassification Base Cased model (from tner) +author: John Snow Labs +name: xlmroberta_ner_base_bionlp2004 +date: 2024-06-13 +tags: [en, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-bionlp2004` is a English model originally trained by `tner`. + +## Predicted Entities + +`protein`, `dna`, `cell line`, `rna`, `cell type` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bionlp2004_en_5.4.0_3.0_1718291003301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_bionlp2004_en_5.4.0_3.0_1718291003301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bionlp2004","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_bionlp2004","en") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.xlmr_roberta.base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_bionlp2004| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|783.1 MB| + +## References + +References + +- https://huggingface.co/tner/xlm-roberta-base-bionlp2004 +- https://github.com/asahi417/tner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_fa.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_fa.md new file mode 100644 index 00000000000000..75ee608d4d033d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_fa.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Persian XLMRobertaForTokenClassification Base Cased model (from BK-V) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_arman +date: 2024-06-13 +tags: [fa, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-arman-fa` is a Persian model originally trained by `BK-V`. + +## Predicted Entities + +`pers`, `event`, `org`, `loc`, `pro`, `fac` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_fa_5.4.0_3.0_1718290853102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_fa_5.4.0_3.0_1718290853102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_arman","fa") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_arman","fa") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("fa.ner.xlmr_roberta.arman_xtreme.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_arman| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|fa| +|Size:|841.0 MB| + +## References + +References + +- https://huggingface.co/BK-V/xlm-roberta-base-finetuned-arman-fa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_pipeline_fa.md new file mode 100644 index 00000000000000..2830610a916848 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_arman_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian xlmroberta_ner_base_finetuned_arman_pipeline pipeline XlmRoBertaForTokenClassification from BK-V +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_arman_pipeline +date: 2024-06-13 +tags: [fa, open_source, pipeline, onnx] +task: Named Entity Recognition +language: fa +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_base_finetuned_arman_pipeline` is a Persian model originally trained by BK-V. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_pipeline_fa_5.4.0_3.0_1718290936653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_arman_pipeline_fa_5.4.0_3.0_1718290936653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_base_finetuned_arman_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_base_finetuned_arman_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_arman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|841.1 MB| + +## References + +https://huggingface.co/BK-V/xlm-roberta-base-finetuned-arman-fa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_kinyarwand_rw.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_kinyarwand_rw.md new file mode 100644 index 00000000000000..280bd435ddd73f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_kinyarwand_rw.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Kinyarwanda XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_ner_kinyarwand +date: 2024-06-13 +tags: [rw, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: rw +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-kinyarwanda` is a Kinyarwanda model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718290999388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_kinyarwand_rw_5.4.0_3.0_1718290999388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_kinyarwand","rw") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_kinyarwand","rw") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("rw.ner.xlmr_roberta.base_finetuned_kinyarwand.by_mbeukman").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_ner_kinyarwand| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|rw| +|Size:|775.2 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-kinyarwanda +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_wolof_wo.md b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_wolof_wo.md new file mode 100644 index 00000000000000..b37c68b784aaf1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-06-13-xlmroberta_ner_base_finetuned_ner_wolof_wo.md @@ -0,0 +1,115 @@ +--- +layout: model +title: Wolof XLMRobertaForTokenClassification Base Cased model (from mbeukman) +author: John Snow Labs +name: xlmroberta_ner_base_finetuned_ner_wolof +date: 2024-06-13 +tags: [wo, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: wo +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-ner-wolof` is a Wolof model originally trained by `mbeukman`. + +## Predicted Entities + +`DATE`, `PER`, `ORG`, `LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_wolof_wo_5.4.0_3.0_1718290974709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_base_finetuned_ner_wolof_wo_5.4.0_3.0_1718290974709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_wolof","wo") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_base_finetuned_ner_wolof","wo") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("wo.ner.xlmr_roberta.base_finetuned").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_base_finetuned_ner_wolof| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|wo| +|Size:|772.3 MB| + +## References + +References + +- https://huggingface.co/mbeukman/xlm-roberta-base-finetuned-ner-wolof +- https://arxiv.org/abs/2103.11811 +- https://github.com/Michael-Beukman/NERTransfer +- https://github.com/masakhane-io/masakhane-ner \ No newline at end of file diff --git a/docs/_posts/akrztrk/2024-04-22-mpnet_embeddings_biolord_2023_en.md b/docs/_posts/akrztrk/2024-04-22-mpnet_embeddings_biolord_2023_en.md new file mode 100644 index 00000000000000..b0f57730b904a6 --- /dev/null +++ b/docs/_posts/akrztrk/2024-04-22-mpnet_embeddings_biolord_2023_en.md @@ -0,0 +1,85 @@ +--- +layout: model +title: English BioLORD-2023 MPNetEmbeddings from FremyCompany +author: John Snow Labs +name: mpnet_embeddings_biolord_2023 +date: 2024-04-22 +tags: [mpnet, en, embeddings, biolord, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.2.2 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP `mpnet_embeddings_biolord_2023` is a English model originally trained by `FremyCompany`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_embeddings_biolord_2023_en_5.2.2_3.0_1713822166758.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_embeddings_biolord_2023_en_5.2.2_3.0_1713822166758.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("documents") + +embeddings =MPNetEmbeddings.pretrained("mpnet_embeddings_biolord_2023","en")\ + .setInputCols(["documents"])\ + .setOutputCol("mpnet_embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("documents") + +val embeddings = MPNetEmbeddings + .pretrained("mpnet_embeddings_biolord_2023", "en") + .setInputCols(Array("documents")) + .setOutputCol("mpnet_embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpnet_embeddings_biolord_2023| +|Compatibility:|Spark NLP 5.2.2+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[MPNet]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/FremyCompany/BioLORD-2023 \ No newline at end of file