Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AV1 samplebuilder support (WIP) #264

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

jech
Copy link
Member

@jech jech commented Apr 15, 2024

  • Make AV1Packet implement Depacketizer
  • Implement AppendToSample for AV1Packet

An attempt to implement the required hooks for the samplebuilder
to work with AV1. This approach is likely to be both faster and
simpler than the one in #238.

COMPLETELY UNTESTED, DO NOT COMMIT.

@alexpokotilo
Copy link

alexpokotilo commented Apr 15, 2024

  1. func (p *AV1Packet) IsPartitionHead(payload []byte) bool is called before func (p *AV1Packet) Unmarshal(payload []byte) ([]byte, error) { and hence you cannot use !p.Z && p.N and should use something like

    if len(payload) == 0 {
    return false
    }
    return (payload[0] & byte(0b10000000)) == 0

I was not able to test further as IsPartitionHead always false

jech added 2 commits April 15, 2024 19:13
This adds IsPartitionHead and IsPartitionTail to AV1Packet.
AV1 RTP payload cannot be concatenated without first recovering
the omitted OBU length.
@jech jech force-pushed the av1-samplebuilder branch from 3ee215c to 89c2b75 Compare April 15, 2024 17:14
@jech
Copy link
Member Author

jech commented Apr 15, 2024

Fixed. However, as I've mentioned, this is completely untested, and I'd be very surprised if it worked. The question is whether you agree that this could work in principle, since it is both more efficient and simpler than what you propose.

Copy link

codecov bot commented Apr 15, 2024

Codecov Report

Attention: Patch coverage is 0% with 31 lines in your changes are missing coverage. Please review.

Project coverage is 84.70%. Comparing base (74a9dc7) to head (89c2b75).

Files Patch % Lines
codecs/av1_packet.go 0.00% 31 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #264      +/-   ##
==========================================
- Coverage   86.21%   84.70%   -1.52%     
==========================================
  Files          22       22              
  Lines        1734     1765      +31     
==========================================
  Hits         1495     1495              
- Misses        203      234      +31     
  Partials       36       36              
Flag Coverage Δ
go 84.70% <0.00%> (-1.52%) ⬇️
wasm 84.24% <0.00%> (-1.51%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexpokotilo
Copy link

Fixed. However, as I've mentioned, this is completely untested, and I'd be very surprised if it worked. The question is whether you agree that this could work in principle, since it is both more efficient and simpler than what you propose.

AppendToSample implementation is wrong. There are maybe several OBUS in AV1Packet and AppendToSample should run the similar loop as func (p *AV1Packet) Unmarshal

But there is much simplier approach much better than my and yours: just return

OBULen+OBU list right from Unmarshal
I checked and nobody cares about func (p *AV1Packet) Unmarshal. Who use AV1Packet without samplebuilder, they use AV1Packet for it's OBUelements map. They don't care about payload format at all.

All my fix was to workaround current useless av1 format. your AppendToSample introduce to keep current format but to produce right format after Unmarshal. Shouldn't we just return all we need from Unmarshal. Yes, we need to allocate for sample but your implementation do the same, but for only one OBU. This is wrong. There maybe multiple OBUs in Av1Packet.
I'd bet to return correct av1 bitstreama directly from Unmarshal

@jech
Copy link
Member Author

jech commented Apr 15, 2024

AppendToSample implementation is wrong.

Interesting.

There are maybe several OBUS in AV1Packet

That's right, and my implementation builds a sample as a sequence of LE128-prefixed OBUs. Are you suggesting a different representation?

I checked and nobody cares about func (p *AV1Packet) Unmarshal.

https://github.com/jech/galene/blob/master/codecs/codecs.go#L48

This code does the equivalent of AV1Packer.Unmarshal in an efficient manner. I had to write it because Unmarshal was written by people who don't care about efficiency.

Shouldn't we just return all we need from Unmarshal.

Unmarshal is a low-leve function that should run in constant time and perform no allocations. Check the H264 version of Unmarshal, it's done right, it doesn't try to parse the list of NALUs. Please also read the comments in #265 and #266.

@jech
Copy link
Member Author

jech commented Apr 15, 2024

@alexpokotilo Another point is that you cannot reassemble fragmented OBUs in Unmarshal, since the packets haven't been reordered at this point: it's the samplebuilder's (or the jitter buffer's) job to reorder packets. So there's really no alternative to doing the parsing later than at Unmarshal time.

@alexpokotilo
Copy link

alexpokotilo commented Apr 16, 2024

AppendToSample implementation is wrong.

Interesting.

There are maybe several OBUS in AV1Packet

That's right, and my implementation builds a sample as a sequence of LE128-prefixed OBUs. Are you suggesting a different representation?

@jech, maybe my problem is that I see AppendToSample implementation but don't see its usage in your branch. Could you please finish your version so I can just use it and check it really support all cases.
Now I see a part of job. I don't understand how AppendToSample do the same OBUs processing as Unmarshal because you didn't add AppendToSample call to sample builder. If you call it in loop till all data is processed then ok, I got the point. If you call it without loop like you pointed in your #238 (comment) example I don't get the point then. Again speaking about remained fragment of last OBU. I don't see how you are going to process it.

Long story short. You are right and my version is not ideal. I don't like it either.
When I did it I tried to close the gap in AV1Packet implementation. I had to use AV1Frame to correctly process last remained OBU as AV1Packet drops OBUElements on unmarshal and don't save last not complete OBU. For example H264Packet save fragmented nal for further processing. AV1Packet doesn't do that so your AppendToSample will not work.

From performance pov it doesn't matter where you process OBUs: in unmarshal method or in AppendToSample right next to it.
I can see AV1Packet is used only in samplebuffer and some tests.
if we use AppendToSample or return processed data from unmarshal the total performance will be the same. You will not be able to eliminate allocation in unmarshal if you add fragmented obu support there. Take a look H264Packet. If you have fuaBuffer from previous packet then "p.fuaBuffer = append(p.fuaBuffer, payload[fuaHeaderSize:]...)" performaed. and this is allocation.

Now AV1Packet just drop OBUElements on every unmarshal. it doesn't care about previous fragments. I had to use AV1Frame to combine OBUs from AV1Packets as AV1Packet doesn't care about it.

What you propose is :

  • add new interface AppendToSample
  • add check of it into sample buffer
  • call this AppendToSample in loop.
    And all of this because you don't want allocations in AV1Packets's unmarshal. Maybe you are right and unmarshal shouldn't do allocations but look at H264Packet unmarshal. It does allocation. and it handles tail from previous unsharshal itself. So it's not just av1 problem after all. To be honest I don't see any problem with this approash at all. If you want to check just rtp headers you can just process packet payload or we can add new method to all Packets like unmarshalHeader. we can use it in unmarshal as well.

Don't get me wrong please. I admit my initial approach is wrong and I'm starving to test your version but it doesn't exist.

I still don't see a problem in adding tail support into AV1Packets the same way H264Packet do and return necessary format from it without adding new interface AppendToSample. at the end of the day you will have to allocate somewhere. Why not in unmarshal ? What so special in this method so that you don't want to do the job there? these Packet structures are used mainly in samplebuilder. You don't add any gain moving processing in new method AppendToSample.

If you complain about unmarshal-only approach only from design POV - I'm sorry I don't completely confident to speak from design POV. I'm not Packet approach designer. But If you think your approach gain something to samplebuffer users - please complete it and I will create performance test to check where you are right or not.

I think it's time to complete AppendToSample so I can do tests and check whether it works or not. Now I cannot and don't see how I can help to prove your points or argue with them.

My main problem with AppendToSample approach is that I don't get what is so special with av1 so we need to handle it different way from h264? do we have tails in h264? yes, we do. How we process them? in unmarshal. Do we do allocation in h264's unmarshals? Yes, in case we have previous tail. Does h264' unmarshal return ready to append buffer? Yes, it is.

You want to have lightweight header parser without all body processing - go ahead but this is a different story.
I don't see how you improve performance by moving processing into AppendToSample and call it right after unmarshal. What samplebuffer users gain from this ?

Do I agree that AV1Packet made in different way than others Packets? Yes.
Should we add tail support in it to not rely on Av1Frame for assembling? 100%
Should we add AppendToSample to offload unmarshal? For what ?
Should we remove OBUElements processing? Well yes, but Av1Frame users will not be happy right now. this is legasy, We can add flag like @Sean-Der proposed in one of your pulls.

This is all I can say about it.

@alexpokotilo
Copy link

alexpokotilo commented Apr 16, 2024

@alexpokotilo Another point is that you cannot reassemble fragmented OBUs in Unmarshal, since the packets haven't been reordered at this point: it's the samplebuilder's (or the jitter buffer's) job to reorder packets. So there's really no alternative to doing the parsing later than at Unmarshal time.

Please look at H264Packet. it does process tails from previous unmarshal call relying on the fact that samplebuffer call unmarshal in correct order. This is what I see in the code at least. Check func (p *H264Packet) Unmarshal(payload []byte) ([]byte, error) and it's p.fuaBuffer processing. If unmarshal didn't rely on packet order this approach would not work for h264.

@jech
Copy link
Member Author

jech commented Apr 16, 2024

Please look at H264Packet. it does process tails from previous unmarshal call relying on the fact that samplebuffer call unmarshal in correct order.

Yes, and that's wrong, for at least three reasons:

  • people expect Unmarshal and Marshal should be (at least logically) inverses; if you defragment at Unmarshal time, you loose this property, which causes subtle bugs;
  • there is no place where it is documented that Unmarshal must be colled in sequential orde (the VP8 and VP9 formats don't require it), so the requirement causes subtle bugs;
  • there are applications that want to call Unmarshal just to check header flags, these applications do not want to pay the whole price of parsing the OBU or NALU list.

Alex, let us please do the right thing in AV1, and let's fix H.264. I've started here:

at the end of the day you will have to allocate somewhere. Why not in unmarshal ? What so special in this method

Because Unmarshal is not optional: if you want to work with a packet, you must call Unmarshal. Any other methods are optional, you only call them if you need the functionality.

Pion is designed to be suitable for many different applications, from full-featured multimedia applications (that do encoding, decoding and so on) to fast media servers (that want to push massive amounts of packets around while paying as little overhead as possible. We must endeavour to make sure that Pion remains widely usable, and doesn't become optimised for one specific class of applications.

@alexpokotilo
Copy link

Please look at H264Packet. it does process tails from previous unmarshal call relying on the fact that samplebuffer call unmarshal in correct order.

Yes, and that's wrong, for at least three reasons:

* people expect Unmarshal and Marshal should be (at least logically) inverses; if you defragment at Unmarshal time, you loose this property, which causes subtle bugs;

* there is no place where it is documented that Unmarshal must be colled in sequential orde (the VP8 and VP9 formats don't require it), so the requirement causes subtle bugs;

* there are applications that want to call Unmarshal just to check header flags, these applications do not want to pay the whole price of parsing the OBU or NALU list.

So maybe add Another method to help these application and keep current interface to others ?
I respect your passion and willing to change things you consider not right. I just don't see anything wrong with the way it is for h264 and others. I vote to fix AV1Packet. If you can convince maintainers then you are right. I don't see any reason for this. But I could be wrong

@lebedyncrs
Copy link

@jech any plans to continue on this one? If not can you please sum up what needs to be done? maybe somebody take over

@jech
Copy link
Member Author

jech commented Sep 11, 2024

Yes, I'm planning to work on it. I'm actually planning a fairly comprehensive rework of the packet code:

  • add a new interface, so that the zero-alloc mode has feature-parity with the (currently default) wasteful mode;
  • modify the sample-builder to work with the zero-alloc mode;
  • add support for the new interface to the AV1 codec (zero-alloc mode only, since I'm planning to deprecate the wasteful mode);
  • convince Sean to make zero-alloc the default (incompatible change).

Unfortunately, I won't be able to work on that before the end of October.

@lebedyncrs
Copy link

Yes, I'm planning to work on it. I'm actually planning a fairly comprehensive rework of the packet code:

  • add a new interface, so that the zero-alloc mode has feature-parity with the (currently default) wasteful mode;
  • modify the sample-builder to work with the zero-alloc mode;
  • add support for the new interface to the AV1 codec (zero-alloc mode only, since I'm planning to deprecate the wasteful mode);
  • convince Sean to make zero-alloc the default (incompatible change).

Unfortunately, I won't be able to work on that before the end of October.

Sounds exciting, please let me know if you would need help with testing. Does this PR is going to solve #273 ?

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

Hello, I'm going to start working on AV1 support next, can I adopt this? :)

@jech
Copy link
Member Author

jech commented Jan 8, 2025

No, but we can work together.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

I meant asking if I can continue your work; I have time to work on it, and I do agree with your approach here :) can you share with me with me what you found with this experiential and the rest of your vision? fixing other codecs seems interesting but we must do it in a breaking way.

@jech
Copy link
Member Author

jech commented Jan 8, 2025

Currently, the samplebuilder does that:

    buf, err := s.depacketizer.Unmarshal(packet.Payload)
    data = append(data, buf)

At first sight, that looks good, but in practice the Depacketizer interface is implemented by the various packet types, which is wrong, since it confuses, say, XXXPacket.Unmarshal and Depacketizer.Unmarshal. But from the point of the library user the two functions have different usage patterns:

  • XXXPacket.Unmarshal is a stateless function (we shouldn't have any state in XXXPacket), that may be called in any order (you don't expect to be required to call XXXPacket.Unmarshal in sequential order, for example in receive order);
  • Depacketizer.Unmarshal is a stateful function, that encodes the dependencies between packets in the Depacketizer.

An example of the confusion is here: https://github.com/pion/rtp/blob/master/codecs/h264_packet.go#L186. The H264Packet contains state (fuaBuffer) which the Unmarshal method uses to carry dependencies between packets. The effect is that if the H264packet.Unmarshal method is called non-sequentially (say, in reception order rather than in seqno order), the data will become corrupted. We'll hit the same issue with AV1. (VP8 and VP9 have simple packetisation rules that don't require state.)

The solution is to add a method GetDepacketizer to XXXPacket, and pass a Depacketizer rather than an XXXPacket to the SampleBuilder function. The XXXPacket.Unmarshal functions need to be made stateless and efficient, which is what I've started doing when the NoAlloc flag is set.

Finally, as long as we're breaking compatibility, we should replace the Depacketizer.Unmarshal function with

func (d *Depacketizer) AppendData(data) error

which does the equivalent of

data = append(data, d.Unmarshal())

This way, the SampleBuilder can call AppendData instead of Unmarshal, avoiding one allocation.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

XXXPacket.Unmarshal is a stateless function (we shouldn't have any state in XXXPacket), that may be called in any order (you don't expect to be required to call XXXPacket.Unmarshal in sequential order, for example in receive order);

As a user of the library this confused me with the h264 library.

The solution is to add a method GetDepacketizer to XXXPacket, and pass a Depacketizer rather than an XXXPacket to the SampleBuilder function. The XXXPacket.Unmarshal functions need to be made stateless and efficient, which is what I've started doing when the NoAlloc flag is set.
Finally, as long as we're breaking compatibility, we should replace the Depacketizer.Unmarshal function with

I don't think we can break it now, maybe we add a new interface and make it work with the new interface and the old Unmarshal / Marshal. This way we can add to av1 and even other codecs (without removing the Unmarshal / Marshal, just soft deprecate) Until we change that in the next major version in the future.

What do you think?

@jech
Copy link
Member Author

jech commented Jan 8, 2025

I think we should first implement the right thing on a separate branch, without regard to backwards compatibility. Once we're agreed that's the right thing, we see how much nasty hacks we can accept in order to preserve compatibility.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

I think we should first implement the right thing on a separate branch, without regard to backwards compatibility. Once we're agreed that's the right thing, we see how much nasty hacks we can accept in order to preserve compatibility.

I'm down for this and I have the time and the energy to do both, but first I think I will have to ship it to this repo, I think your approach is good, and we can do type check. But do you mean i should ship it with marshal / unmarshal, And we make the new Depacketizer interface for the new repo?

@jech
Copy link
Member Author

jech commented Jan 8, 2025

I think that for now we can work with the current rtp.Depacketizer interface, we just need to decouple XXXPacket from XXXDepacketizer for all codecs with non-trivial depacketisation rules.

The AppendData optimisation we can do at a later stage.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

maybe we can just update the Depacketizer interface to this and leave the Unmarshal function for now but with soft deprecate and remove it by the next major release?

edit: Never mind i realized this will break user provided Depacketizer.

type Depacketizer interface {  
    // soft deprecated, just for end users.
    Unmarshal(packet []byte) ([]byte, error)  

    // what pion will use.
    AppendData(dest *[]byte, packet []byte) error  

    IsPartitionHead(payload []byte) bool  

    IsPartitionTail(marker bool, payload []byte) bool  
}  

what do you think @jech ?

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

I think that for now we can work with the current rtp.Depacketizer interface, we just need to decouple XXXPacket from XXXDepacketizer for all codecs with non-trivial depacketisation rules.

Okay i see, so do i make a PR for Unmarshal for av1? I will add an option to disable parsing like you did with h264.

@jech
Copy link
Member Author

jech commented Jan 8, 2025

I think we definitely should not continue with the broken approach.

I would start with adding the following method to all of the XXXPacket types:

func (p *H264Packet) GetDepacketizer rtp.Depacketizer {
    return p.Clone()
}

Then either modify all of the examples that use the samplebuilder to probe the packet for the GetDepacketizer method, or else modify the SampleBuilder itself to call GetDepacketizer before starting depacketisation. Not sure which is the right approach.

Once that is done, we can start adding non-broken depacketizers to all of the XXXPacket types.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

Why continue with the broken approach? Wouldn't it be better to start with doing the decoupling for H264Packet?

The only reason is that I don't want to break anything for the end user :) and I want the current users to be able to use av1 with this package. But i think we should do this "rewrite" by the next major Pion release.

Maybe the two interfaces approach is good? I want to ship av1 and fix all the reported bugs related to av1 by this week.

If you found a good way to do all of this without breaking any changes, just provide your design approach and i will ship the implementation in no time :)

@jech
Copy link
Member Author

jech commented Jan 8, 2025

If you found a good way to do all of this without breaking any changes, just provide your design approach

Sorry, I edited my prvious message just after you replied.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

Once that is done, we can start adding non-broken depacketizers to all of the XXXPacket types.

Okay I think this is good, the only issue if users provide their own Depacketizer is this a thing? I can do a quick code search.

@jech
Copy link
Member Author

jech commented Jan 8, 2025

The point I'm making is that while we should strive not to break compatibility for existing codecs, we should not implement the broken approach for AV1: if we add a new non-broken API, then we should leverage AV1 to encourage people to switch to the new API.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

Okay, So I'm not really sure, I think your approach is obviously better for av1, but I'm not sure if we can break changes, I think we should post it to the slack and see what people see. then i will see what route we go, Again i believe i can do the implementation itself in no time, but I just need other contributors to agree :)

What do you think?

@jech
Copy link
Member Author

jech commented Jan 8, 2025

In my experience, getting others to agree is easier when you have working code to show them.

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

Well, my point is that I don't want to waste time that can be spent on other issues :)

but let's say we can break changes what do you think the final design should look like?

@jech
Copy link
Member Author

jech commented Jan 8, 2025

sb := depacketizer.New(... codecs.AV1Packet{}.GetDepacketizer(), ...)
var p AV1Packet
for {
    n, err := track.Read(buf)
    p.Unmarshal(buf, n)
    depacketizer.Push(&p)
    for {
        ...
        depacketizer.PopWithTimestamp()
        ..
}

The only change wrt. what is done now is the call to depacketizer.New, which requires the caller to call GetDepacketizer by themselves. For now, we keep that optional for existing codecs, but require it for AV1, so that people will naturally migrate to the new API when they implement AV1 support.

So short term plan:

  • add GetDepacketizer to all existing codecs;
  • modify all examples to probe for GetDepacketizer;
  • implement AV1 with required GetDepacketizer.

In order to make GetDepacketizer compulsory for AV1, you simply don't implement the PartitionHead methods on AV1Packet, only on AV1Depacketizer.

Medium term plan:

  • add a working GetDepacketizer for H.264 when NoAlloc is true.

Long term plan:

  • make GetDepacketizer compulsory for all codecs;
  • make NoAlloc the default, and remove the method;
  • switch to v5.

@jech
Copy link
Member Author

jech commented Jan 8, 2025

I think we should post it to the slack

I think it's a good idea to send a pointer to this discussion to Slack. I'll try to check if there's any discussion on Slack, but no promises, I'm developing Slack-phobia. What's wrong with mailing lists?

Grumble, unsearchable proprietary crap, grumble, no proper archive, grumble, and now get off my loan ;-)

@joeturki
Copy link
Member

joeturki commented Jan 8, 2025

Okay i posted it to the slack, we will see by tomorrow, I'm going to work on other issues today :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants