Open Bug 1563675 Opened 5 years ago Updated 2 years ago

Support low latency encoding in MediaRecorder

Categories

(Core :: Audio/Video: Recording, enhancement, P3)

67 Branch
enhancement

Tracking

()

People

(Reporter: quae, Unassigned)

References

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0

Actual results:

Currently using MediaRecorder with a stream from getUserMedia gives output every ~300ms.

Expected results:

I'd like to get a realtime(ish) Opus stream of a user's microphone.
Could you add support for low latency encoding?

Type: defect → enhancement
Component: Untriaged → Audio/Video
Product: Firefox → Core

While the MediaRecorder spec doesn't really say how to treat latency, it seems reasonable that one could affect it by calling requestData() every so often, or setting a low enough timeslice.

Let's keep this bug around to track that work.


In the meantime, we do have a workaround of sorts. If you're encoding both video and audio (video/webm) and set a timeslice, you'll get data at roughly the interval of the timeslice.

Because our webm muxer doesn't pass data on to the gathered blob until it has finished a cluster -- and by spec recommendation we wait for a key frame before finishing a cluster, so the next cluster starts on a key frame, it can take a while. We issue keyframes at the interval of the timeslice as a way to reduce latency while we don't support passing partial clusters to the blob. Do note that frequent keyframes and small clusters means more overhead though.


If you're only encoding audio, I just fixed a bug that let's you get webm when you specify audio/webm. With audio/webm you get data at most every second. Before that bug, or if you don't set a mime type, you get ogg/audio, with which we flush data after libogg considers a page full, which seems to happen when it reaches 4kB.

If you get data every 300ms and you're encoding only audio I suppose your bitrate is around 100kbps? You should get lower latency if you could increase the bitrate (which you can affect with the audioBitsPerSecond MediaRecorderOptions member). Though it could be flaky if the opus bitrate varies with content.

Status: UNCONFIRMED → NEW
Component: Audio/Video → Audio/Video: Recording
Ever confirmed: true
Priority: -- → P3

If you get data every 300ms and you're encoding only audio I suppose your bitrate is around 100kbps? You should get lower latency if you could increase the bitrate

But I want a low bitrate stream....

Is there a way you could expose the opus encoder options more directly? From the wikipedia article (https://en.wikipedia.org/wiki/Opus_(audio_format)#Features):

SILK supports frame sizes of 10, 20, 40 and 60 ms. CELT supports frame sizes of 2.5, 5, 10 and 20 ms. Thus, hybrid mode only supports frame sizes of 10 and 20 ms; frames shorter than 10 ms will always use CELT mode. A typical Opus packet contains a single frame, but packets of up to 120 ms are produced by combining multiple frames per packet.

(In reply to Daurnimator from comment #2)

If you get data every 300ms and you're encoding only audio I suppose your bitrate is around 100kbps? You should get lower latency if you could increase the bitrate

But I want a low bitrate stream....

Is there a way you could expose the opus encoder options more directly?

There's nothing like that in the spec.

Like mentioned in comment 1 we could improve how we gather up data on requestData() and with small timeslices. Patches welcome.

There's nothing like that in the spec.

Are you not free to implement MediaRecorder for any content type?
https://tools.ietf.org/html/rfc7587#section-6.1 defines the audio/opus type with parameters including maxplaybackrate, maxptime, cbr and useinbandfec.

we could improve how we gather up data on requestData() and with small timeslices. Patches welcome.

I wouldn't really know where to start; I'd need plenty of hand-holding if I attempt this.

(In reply to Daurnimator from comment #4)

There's nothing like that in the spec.

Are you not free to implement MediaRecorder for any content type?
https://tools.ietf.org/html/rfc7587#section-6.1 defines the audio/opus type with parameters including maxplaybackrate, maxptime, cbr and useinbandfec.

Right. That can be done -- parsing parameters from the mime type. But seems like a different issue from this bug.

we could improve how we gather up data on requestData() and with small timeslices. Patches welcome.

I wouldn't really know where to start; I'd need plenty of hand-holding if I attempt this.

If you're willing, here are some resources:
In general, contributing code: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Introduction
Building from scratch: https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions

When you've built you can run our mediarecorder tests with:
./mach mochitest dom/media/test/test_mediarecorder* and ./mach wpt testing/web-platform/tests/mediacapture-record

For what I meant we can do in comment 1, there are two parts:

  1. requestData, and data requests based on timeslice, be made to explicitly extract data into the blob before pushing it.
  2. Make the container writers able to flush data before they've finished writing a certain amount of data. For webm the EbmlComposer currently only outputs full clusters, it could probably be convinced to output data directly after a block instead, but its internal state will need to be refactored to accomodate this. I'd say this is pretty fiddle-y to get right. There's the OggWriter as well, where something similar could probably be done. The work should result in them being stream-able, essentially.

Right. That can be done -- parsing parameters from the mime type. But seems like a different issue from this bug.

Wouldn't support of maxptime effectively solve this issue?


But yes your approach would too.

Though for my specific use-case I don't need/want a container, so it really seems to be getting in the way.

(In reply to Daurnimator from comment #6)

Right. That can be done -- parsing parameters from the mime type. But seems like a different issue from this bug.

Wouldn't support of maxptime effectively solve this issue?

It would producer a higher number of packets. Opus packets that must be muxed before reaching content.

Our ogg muxer waits for a page to become full (of packets). Our webm muxer waits for a second worth of content before finishing a cluster, no matter what.

So no.

What if I don't want the stream muxed?

Then work that out with the w3c working group, because this spec doesn't define a containerless mode.

See Also: → 1706229
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.