Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARKNLP-942: MPNet Classifiers #14147

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/en/annotators.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,8 @@ Additionally, these transformers are available.
{% include templates/anno_table_entry.md path="./transformers" name="LongformerForTokenClassification" summary="LongformerForTokenClassification can load Longformer Models with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks."%}
{% include templates/anno_table_entry.md path="./transformers" name="MarianTransformer" summary="Marian is an efficient, free Neural Machine Translation framework written in pure C++ with minimal dependencies."%}
{% include templates/anno_table_entry.md path="./transformers" name="MPNetEmbeddings" summary="Sentence embeddings using MPNet."%}
{% include templates/anno_table_entry.md path="./transformers" name="MPNetForQuestionAnswering" summary="MPNet Models with a span classification head on top for extractive question-answering tasks like SQuAD."%}
{% include templates/anno_table_entry.md path="./transformers" name="MPNetForSequenceClassification" summary="MPNet Models with sequence classification/regression head on top e.g. for multi-class document classification tasks."%}
{% include templates/anno_table_entry.md path="./transformers" name="OpenAICompletion" summary="Transformer that makes a request for OpenAI Completion API for each executor."%}
{% include templates/anno_table_entry.md path="./transformers" name="RoBertaEmbeddings" summary="RoBERTa: A Robustly Optimized BERT Pretraining Approach"%}
{% include templates/anno_table_entry.md path="./transformers" name="RoBertaForQuestionAnswering" summary="RoBertaForQuestionAnswering can load RoBERTa Models with a span classification head on top for extractive question-answering tasks like SQuAD."%}
Expand Down
121 changes: 121 additions & 0 deletions docs/en/transformer_entries/MPNetForQuestionAnswering.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
{%- capture title -%}
MPNetForQuestionAnswering
{%- endcapture -%}

{%- capture description -%}
MPNetForQuestionAnswering can load MPNet Models with a span classification head on top for
extractive question-answering tasks like SQuAD (a linear layer on top of the hidden-states
output to compute span start logits and span end logits).

Pretrained models can be loaded with `pretrained` of the companion object:

```scala
val spanClassifier = MPNetForQuestionAnswering.pretrained()
.setInputCols(Array("document_question", "document_context"))
.setOutputCol("answer")
```

The default model is `"mpnet_base_question_answering_squad2"`, if no name is provided.

For available pretrained models please see the
[Models Hub](https://sparknlp.org/models?task=Question+Answering).

To see which models are compatible and how to import them see
https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended
examples, see
[MPNetForQuestionAnsweringTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForQuestionAnsweringTestSpec.scala).
{%- endcapture -%}

{%- capture input_anno -%}

{%- endcapture -%}

{%- capture output_anno -%}
CHUNK
{%- endcapture -%}

{%- capture python_example -%}
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

documentAssembler = MultiDocumentAssembler() \
.setInputCols(["question", "context"]) \
.setOutputCol(["document_question", "document_context"])

spanClassifier = MPNetForQuestionAnswering.pretrained() \
.setInputCols(["document_question", "document_context"]) \
.setOutputCol("answer") \
.setCaseSensitive(False)

pipeline = Pipeline().setStages([
documentAssembler,
spanClassifier
])

data = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context")
result = pipeline.fit(data).transform(data)
result.select("answer.result").show(truncate=False)
+---------------------+
|result |
+---------------------+
|[Clara] |
++--------------------+
{%- endcapture -%}

{%- capture scala_example -%}
import spark.implicits._
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline

val document = new MultiDocumentAssembler()
.setInputCols("question", "context")
.setOutputCols("document_question", "document_context")

val questionAnswering = MPNetForQuestionAnswering.pretrained()
.setInputCols(Array("document_question", "document_context"))
.setOutputCol("answer")
.setCaseSensitive(true)

val pipeline = new Pipeline().setStages(Array(
document,
questionAnswering
))

val data = Seq("What's my name?", "My name is Clara and I live in Berkeley.").toDF("question", "context")
val result = pipeline.fit(data).transform(data)

result.select("label.result").show(false)
+---------------------+
|result |
+---------------------+
|[Clara] |
++--------------------+

{%- endcapture -%}

{%- capture api_link -%}
[MPNetForQuestionAnswering](/api/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForQuestionAnswering)
{%- endcapture -%}

{%- capture python_api_link -%}
[MPNetForQuestionAnswering](/api/python/reference/autosummary/sparknlp/annotator/classifier_dl/mpnet_for_question_answering/index.html#sparknlp.annotator.classifier_dl.mpnet_for_question_answering.MPNetForQuestionAnswering)
{%- endcapture -%}

{%- capture source_link -%}
[MPNetForQuestionAnswering](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForQuestionAnswering.scala)
{%- endcapture -%}

{% include templates/anno_template.md
title=title
description=description
input_anno=input_anno
output_anno=output_anno
python_example=python_example
scala_example=scala_example
api_link=api_link
python_api_link=python_api_link
source_link=source_link
%}
139 changes: 139 additions & 0 deletions docs/en/transformer_entries/MPNetForSequenceClassification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
{%- capture title -%}
MPNetForSequenceClassification
{%- endcapture -%}

{%- capture description -%}
MPNetForSequenceClassification can load MPNet Models with sequence classification/regression
head on top (a linear layer on top of the pooled output) e.g. for multi-class document
classification tasks.

Note that currently, only SetFit models can be imported.

Pretrained models can be loaded with `pretrained` of the companion object:

```scala
val sequenceClassifier = MPNetForSequenceClassification.pretrained()
.setInputCols("token", "document")
.setOutputCol("label")
```

The default model is `"mpnet_sequence_classifier_ukr_message"`, if no name is provided.

For available pretrained models please see the
[Models Hub](https://sparknlp.org/models?task=Text+Classification).

To see which models are compatible and how to import them see
https://github.com/JohnSnowLabs/spark-nlp/discussions/5669 and to see more extended
examples, see
[MPNetForSequenceClassificationTestSpec](https://github.com/JohnSnowLabs/spark-nlp/blob/master/src/test/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForSequenceClassificationTestSpec.scala).
{%- endcapture -%}

{%- capture input_anno -%}
DOCUMENT, TOKEN
{%- endcapture -%}

{%- capture output_anno -%}
CATEGORY
{%- endcapture -%}

{%- capture python_example -%}
import sparknlp
from sparknlp.base import *
from sparknlp.annotator import *
from pyspark.ml import Pipeline

document = DocumentAssembler() \
.setInputCol("text") \
.setOutputCol("document")

tokenizer = Tokenizer() \
.setInputCols(["document"]) \
.setOutputCol("token")

sequenceClassifier = MPNetForSequenceClassification \
.pretrained() \
.setInputCols(["document", "token"]) \
.setOutputCol("label")

data = spark.createDataFrame([
["I love driving my car."],
["The next bus will arrive in 20 minutes."],
["pineapple on pizza is the worst 🤮"],
]).toDF("text")

pipeline = Pipeline().setStages([document, tokenizer, sequenceClassifier])
pipelineModel = pipeline.fit(data)
results = pipelineModel.transform(data)
results.select("label.result").show()
+--------------------+
| result|
+--------------------+
| [TRANSPORT/CAR]|
|[TRANSPORT/MOVEMENT]|
| [FOOD]|
+--------------------+
{%- endcapture -%}

{%- capture scala_example -%}
import com.johnsnowlabs.nlp.base._
import com.johnsnowlabs.nlp.annotator._
import org.apache.spark.ml.Pipeline
import spark.implicits._

val document = new DocumentAssembler()
.setInputCol("text")
.setOutputCol("document")

val tokenizer = new Tokenizer()
.setInputCols(Array("document"))
.setOutputCol("token")

val sequenceClassifier = MPNetForSequenceClassification
.pretrained()
.setInputCols(Array("document", "token"))
.setOutputCol("label")

val texts = Seq(
"I love driving my car.",
"The next bus will arrive in 20 minutes.",
"pineapple on pizza is the worst 🤮")
val data = texts.toDF("text")

val pipeline = new Pipeline().setStages(Array(document, tokenizer, sequenceClassifier))
val pipelineModel = pipeline.fit(data)
val results = pipelineModel.transform(data)

results.select("label.result").show()
+--------------------+
| result|
+--------------------+
| [TRANSPORT/CAR]|
|[TRANSPORT/MOVEMENT]|
| [FOOD]|
+--------------------+

{%- endcapture -%}

{%- capture api_link -%}
[MPNetForSequenceClassification](/api/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForSequenceClassification)
{%- endcapture -%}

{%- capture python_api_link -%}
[MPNetForSequenceClassification](/api/python/reference/autosummary/sparknlp/annotator/classifier_dl/mpnet_for_sequence_classification/index.html#sparknlp.annotator.classifier_dl.mpnet_for_sequence_classification.MPNetForSequenceClassification)
{%- endcapture -%}

{%- capture source_link -%}
[MPNetForSequenceClassification](https://github.com/JohnSnowLabs/spark-nlp/tree/master/src/main/scala/com/johnsnowlabs/nlp/annotators/classifier/dl/MPNetForSequenceClassification.scala)
{%- endcapture -%}

{% include templates/anno_template.md
title=title
description=description
input_anno=input_anno
output_anno=output_anno
python_example=python_example
scala_example=scala_example
api_link=api_link
python_api_link=python_api_link
source_link=source_link
%}
Loading
Loading