Open Bug 1819637 Opened 1 year ago Updated 10 months ago

Shutdown hanging at principals-quota-manager

Categories

(Core :: Storage: Quota Manager, defect, P3)

defect

Tracking

()

People

(Reporter: mak, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 obsolete file)

Most crashes from https://bugzilla.mozilla.org/show_bug.cgi?id=1700584 are crashing at "principals-quota-manager" step, and very likely at the first request:
https://searchfox.org/mozilla-central/rev/a2d875255c67e39b28d59a0996e8b54fa2d7564e/toolkit/components/cleardata/PrincipalsCollector.sys.mjs#90-91

Maybe there's cases where Services.qms.listOrigins callback is not invoked and an exception not thrown?

(In reply to Marco Bonardo [:mak] from comment #0)

Maybe there's cases where Services.qms.listOrigins callback is not invoked and an exception not thrown?

Could be? Jan, IIUC if we race and SetCallback runs after FireCallback we won't execute the callback ? I think in other cases we store a flag if a callback has been fired already and execute the callback in the setter in case.

Flags: needinfo?(jvarga)

I don't think so, people were asking about this in the past. See this comment https://searchfox.org/mozilla-central/rev/dcf64fc565e3749119bd57202e2ab06533155d16/dom/quota/StorageManager.cpp#254

Could it be that listOrigins takes long time to complete ?

Flags: needinfo?(jvarga)

(In reply to Jan Varga [:janv] from comment #2)

I don't think so, people were asking about this in the past. See this comment https://searchfox.org/mozilla-central/rev/dcf64fc565e3749119bd57202e2ab06533155d16/dom/quota/StorageManager.cpp#254

I can see it being safe there as we are executeting C++ inside the same runnable. But can we be sure that JS will not split our execution into two runnables here:

Services.qms.listOrigins().callback = <function>

?

Flags: needinfo?(jvarga)

(In reply to Jan Varga [:janv] from comment #2)

Could it be that listOrigins takes long time to complete ?

That's the other possibility yes, either it takes too long (tens of seconds), or the callback is not invoked.

(In reply to Marco Bonardo [:mak] from comment #4)

(In reply to Jan Varga [:janv] from comment #2)

Could it be that listOrigins takes long time to complete ?

That's the other possibility yes, either it takes too long (tens of seconds), or the callback is not invoked.

I just looked at one single example but there I cannot see any activity on the background thread(s).

So talking shortly with smaug on matrix he confirmed me that the entire JS Services.qms.listOrigins().callback = <function> should be executed in one runnable (unless we have some event loop spinning inside listOrigins).

But he made me also notice that if we potentially need to create a new background actor in GetOrCreateForCurrentThread there seem to be steps in that process that might not make us propagate errors, like here and in this use of SendInitBackground.

(In reply to Jens Stutte [:jstutte] from comment #6)

So talking shortly with smaug on matrix he confirmed me that the entire JS Services.qms.listOrigins().callback = <function> should be executed in one runnable (unless we have some event loop spinning inside listOrigins).

But he made me also notice that if we potentially need to create a new background actor in GetOrCreateForCurrentThread there seem to be steps in that process that might not make us propagate errors, like here and in this use of SendInitBackground.

The first case is not very likely and the second would crash the app immediately when diagnostic assertions are enabled.

(In reply to Jens Stutte [:jstutte] from comment #5)

(In reply to Marco Bonardo [:mak] from comment #4)

(In reply to Jan Varga [:janv] from comment #2)

Could it be that listOrigins takes long time to complete ?

That's the other possibility yes, either it takes too long (tens of seconds), or the callback is not invoked.

I just looked at one single example but there I cannot see any activity on the background thread(s).

I see profile-change-teardown as shutdown phase there, at that point quota manager is very likely already in shutdown state.
So if any new request comes to parent it probably refuses to allocate a new actor for it and the child side just never receives a response.
See https://searchfox.org/mozilla-central/rev/0d9d9d644b06d039e119242f6f12af21d763e4eb/dom/quota/ActorsParent.cpp#8048

Flags: needinfo?(jvarga)

So we are using the QuotaManagerService inside a chrome script in the parent process, IIUC. Could it be that the QuotaManagerService has no synchronized shutdown blockers/checks with the QuotaManager itself ? It seems we can end up using it as long as it has not been released by all references even in late shutdown?

Assignee: nobody → jstutte
Status: NEW → ASSIGNED
Severity: -- → S3
Priority: -- → P3
Attachment #9337478 - Attachment is obsolete: true

(In reply to Marco Bonardo [:mak] from comment #0)

Most crashes from https://bugzilla.mozilla.org/show_bug.cgi?id=1700584 are crashing at "principals-quota-manager" step, and very likely at the first request:
https://searchfox.org/mozilla-central/rev/a2d875255c67e39b28d59a0996e8b54fa2d7564e/toolkit/components/cleardata/PrincipalsCollector.sys.mjs#90-91

Looking at recent crashes, it seems to me that this is no longer true (or I misread something, it's been a while) ?

Assignee: jstutte → nobody
Status: ASSIGNED → NEW

I must note we'll always have pathologic cases with slow mechanical disks.

Looking at major version > 115 in the last 3 months:

Top crash at 16% with 429 reports is still "principals-quota-manager":
{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize","state":{"progress":{"cache":"cleared","cookies":"cleared","history":"blocking","historyProgress":{"step":"principals-quota-manager"}

Second, at 10% with 277 reports is offlineApps:
{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize","state":{"progress":{"cache":"cleared","cookies":"cleared","offlineApps":"blocking"

Third, at 2% with 70 reports is "clearing browsing history"
{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize","state":{"progress":{"clearHonoringExceptions":true,"cache":"cleared","cookies":"cleared","history":"blocking","historyProgress":{"step":"clearing browsing history"}

So, "principals-quota-manager" is still the worst case.

You need to log in before you can comment on or make changes to this bug.