Open Bug 1356054 Opened 7 years ago Updated 2 years ago

Standardise units used by EncodedFrame

Categories

(Core :: Audio/Video: Recording, enhancement, P3)

enhancement

Tracking

()

Tracking Status
firefox55 --- affected

People

(Reporter: bryce, Unassigned)

References

Details

EncodedFrame[1] contains time and duration members. At the time of writing, it appears that the the time stamp is set by the Opus and VP8 track encoders, as well as by tests, and is always in microseconds[2].

The duration member of EncodedFrame is used in similar cases[3], however the units used here are inconsistent. The VP8 encoder will set the duration in terms of microseconds, and the Opus in terms of samples. The comment on the underlying member, mDuration suggests that it should contain the duration in samples.

Right now the downstream code that uses the duration[4] expects this behaviour. I.e. downstream VP8 code treats duration as uSecs, Ogg code treats it as samples.

I would like to see the duration standardised to a specific unit. This would make EncodedFrame easier to understand, and would also allow for more straight forward reasoning about adding the time and duration on the frame to get an end time.

I suggest we standardise this to use microseconds. However, I would appreciate feedback for if this will provide sufficient granularity for the ogg case, or if we do need to store the exact # samples we're currently doing.

[1]: https://dxr.mozilla.org/mozilla-central/source/dom/media/encoder/EncodedFrameContainer.h#40
[2]: https://dxr.mozilla.org/mozilla-central/search?q=%2Bfunction-ref%3Amozilla%3A%3AEncodedFrame%3A%3ASetTimeStamp%28uint64_t%29
[3]: https://dxr.mozilla.org/mozilla-central/search?q=%2Bfunction-ref%3Amozilla%3A%3AEncodedFrame%3A%3ASetDuration%28uint64_t%29
[4]: https://dxr.mozilla.org/mozilla-central/search?q=%2Bcallers%3A%22mozilla%3A%3AEncodedFrame%3A%3AGetDuration%28%29+const%22
Andreas, what are your thoughts on this?
Flags: needinfo?(pehrson)
Another thing we could do is store the value as a rational. E.g. could store the sample unit.
(In reply to Bryce Van Dyk (:SingingTree) from comment #0)
> I suggest we standardise this to use microseconds.

That might introduce rounding errors when converting frames to microseconds which is what MSG tried to move away from. I think Paul has the context about the MSG refactoring.
Flags: needinfo?(padenot)
Yes, we've had our fair share of issues when we used to use time units and not sample count + sample rate. I think it's reasonable to continue doing what we do for audio.

It's a bit harder for video, because, unlike audio, the time between two frames is not fixed, so we can't simply refer to a number of events and a rate for those events (i.e. sample count + sample rate). What is the approach taken by other software that need to deal with this problem ?
Flags: needinfo?(padenot) → needinfo?(bvandyk)
MediaData[1] stores more information to accommodate this. There are members for both microsecond based times as well as codec specific times. Duration is exposed microseconds, but there is a member for the number of frames.

The use case I'm looking at is that I want to ensure sync of data being written by the MediaEncoder. Since the EncodedFrames being given to the encoder have durations written on them it's possible to reason about when the next expected audio or video frame will arrive. I can do this right now, treating video durations as usecs, and converting from samples to usecs for audio. It feels a little wonky doing so, though. Is exposing a sample based duration, and a usec based duration (similar to MediaData) an amicable way to address this?

More generally, I think EncodedFrame could be replaced by MediaData, to allow for more consistent data structures across media stacks. But perhaps that's a discussion for another bug.

[1]: https://dxr.mozilla.org/mozilla-central/source/dom/media/MediaData.h?q=file%3AMediaData.h&redirect_type=single#275
Flags: needinfo?(bvandyk)
Rank: 25
Priority: -- → P2
(In reply to Bryce Van Dyk (:SingingTree) from comment #5)
> More generally, I think EncodedFrame could be replaced by MediaData, to
> allow for more consistent data structures across media stacks. But perhaps
> that's a discussion for another bug.

Yes. The two separate class are probably here for historical (i.e. not good) reasons, iirc.
Is it possible to use the audio sample rate as the base for video? That would let us avoid a conversion as this is how we get the frames from MediaStreamGraph. I don't know the reason for using usecs for video. I have assumed it's codec or container related. :bechen might know this better.
Flags: needinfo?(pehrson)
Mass change P2->P3 to align with new Mozilla triage process.
Priority: P2 → P3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.