-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extended semantic segmentation to image segmentation #27039
Extended semantic segmentation to image segmentation #27039
Conversation
The documentation is not available anymore as the PR was closed or merged. |
results = panoptic_segmentation(Image.open(image)) | ||
results | ||
``` | ||
As you can see below, every pixel gets classified and there are multiple instances for car again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can we see in this output that every pixel is classifed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed this sentence.
<div class="flex justify-center"> | ||
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/segmentation-comparison.png" alt="Segmentation Maps Compared"/> | ||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd maybe use the same order you used in the exposition: Reference, Semantic Segmentation, Instance Segmentation, Panoptic Segmentation.
The Instance Segmentation Output appears to contain more classes than "car" and "person", but the model output above didn't. Perhaps we could make it consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly that building is classified as car, and this is one of the best (maybe it is the best) instance segmentation models on Hub (mask2former). I'd rather not modify?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this!
+1 to all of @pcuenca's comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the comments from @pcuenca and @amyeroberts. We probably should also add a couple of headers.
Right now, the right-side navigation looks like this:
## Load SceneParse150 dataset
## Preprocess
## Evaluate
## Train
## Inference
I would suggest the following structure:
## Types of segmentation
## Fine-tune a semantic segmentation model
### Load SceneParse150 dataset
### Preprocess
### Evaluate
### Train
### Inference
Also, in the inference example at the end of the fine-tuning section, we can probably leave only the example of doing inference manually, since we already show inference examples with a pipeline at the beginning of the doc.
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
I addressed all the comments. Sorry I deprioritized it shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just one small nit
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
## Types of Segmentation | ||
|
||
Semantic segmentation assigns a label or class to every single pixel in an image. Let's take a look at a semantic segmentation model output. It will assign the same class to every instance of an object it comes across in an image, for example, all cats will be labeled as "cat" instead of "cat-1", "cat-2". | ||
We can use transformers' image segmentation pipeline to quickly infer a semantic segmentation model. Let's take a look at the example image. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use transformers' image segmentation pipeline to quickly infer a semantic segmentation model. Let's take a look at the example image. | |
We can use Transformers' image segmentation pipeline to quickly infer with a semantic segmentation model called [SegFormer](model_doc/segformer). Let's take a look at the example image. |
Not sure the link here will work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/semantic_segmentation_output.png" alt="Semantic Segmentation Output"/> | ||
</div> | ||
|
||
In instance segmentation, the goal is not to classify every pixel, but to predict a mask for **every instance of an object** in a given image. We will use [facebook/mask2former-swin-large-cityscapes-instance](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-instance) for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In instance segmentation, the goal is not to classify every pixel, but to predict a mask for **every instance of an object** in a given image. We will use [facebook/mask2former-swin-large-cityscapes-instance](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-instance) for this. | |
In instance segmentation, the goal is not to classify every pixel, but to predict a mask for **every instance of a class** in a given image. We will use [facebook/mask2former-swin-large-cityscapes-instance](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-instance) for this. |
I would add here that instance segmentation is very similar to object detection: you want to get a set of instances out of your image, the only difference between object detection and instance segmentation being that the former gets a bounding box per instance, whereas the latter gets a binary mask per instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a really nice way to build intuition for instance segmentation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I addressed it
<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/instance_segmentation_output.png" alt="Semantic Segmentation Output"/> | ||
</div> | ||
|
||
Panoptic segmentation combines semantic segmentation and instance segmentation, where every pixel is classified, and there are multiple masks for each instance of a class. We can use [facebook/mask2former-swin-large-cityscapes-panoptic](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-panoptic) for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Panoptic segmentation combines semantic segmentation and instance segmentation, where every pixel is classified, and there are multiple masks for each instance of a class. We can use [facebook/mask2former-swin-large-cityscapes-panoptic](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-panoptic) for this. | |
Panoptic segmentation combines semantic segmentation and instance segmentation, where every pixel is classified, and there are multiple masks for each instance of a class. We can use [facebook/mask2former-swin-large-cityscapes-panoptic](https://huggingface.co/facebook/mask2former-swin-large-cityscapes-panoptic) for this. |
Panoptic segmentation technically assigns 2 labels per pixel: a semantic category and an instance ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating on the guide! LGTM, only one minor nit - the notebook login and pip install section is repeated twice in the guide.
Before you begin, make sure you have all the necessary libraries installed: | ||
|
||
```bash | ||
pip install -q datasets transformers evaluate | ||
``` | ||
|
||
We encourage you to log in to your Hugging Face account so you can upload and share your model with the community. When prompted, enter your token to log in: | ||
|
||
```py | ||
>>> from huggingface_hub import notebook_login | ||
|
||
>>> notebook_login() | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paragraph is repeated later in the fine-tuning section. It's probably best to have this information only once. I would suggest to leave the library installation instructions here (as we need the libraries installed for inference examples to work), but the notebook log in makes more sense in the fine-tuning section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for letting me know, addressed this!
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
@MKhalusova can you merch if you approve? 👉 👈 🥹 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe rename file/URL to image_segmentation.md
, for consistency with the contents. (Also in the yaml, of course) :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
We need green CI checks before we can merge. It looks like the |
@MKhalusova it's my fault, I suggested to rename the file for consistency. I opened this PR to @merveenoyan's repo, which fixes the problem locally. It also fixes an issue with I also saw a |
It's not a big deal, we can just go back to @merveenoyan's version before the rename if that's simpler. |
Regarding redirects. For example, |
@MKhalusova according to Mishig's response we need to merge before it turns red and then it will be green, so maybe you can make the call in this case. |
There are two different things here.
Given the increased complexity and that @MKhalusova said we generally try to avoid renames, I'd suggest we remove the rename and keep the same filename it had before. Sorry for introducing noise! |
Can this be merged by someone with write access? |
I can merge it :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for iterating!
This PR extends semantic segmentation guide to cover two other segmentation types (except for the big fine-tuning part) and compares them. cc @NielsRogge as discussed