Open Bug 1683401 Opened 4 years ago Updated 1 year ago

LSNG can break, throwing NS_ERROR_FILE_CORRUPTED errors when the shadow and ls-archive databases have dynamic corruption

Categories

(Core :: Storage: localStorage & sessionStorage, defect, P3)

defect

Tracking

()

People

(Reporter: asuth, Unassigned)

References

Details

Attachments

(2 files)

mconley just dropped by the Workers & Storage channel with a problem where mana was throwing NS_ERROR_FILE_CORRUPTED errors when accessing LocalStorage and clearing the storage for that origin wasn't fixing things. I had him run PRAGMA integrity_check against both webappsstore.sqlite and storage/ls-archive.sqlite and both failed, with the ls-archive rows apparently transplanted from webappsstore. He saved both files off for future consultation and deleted both files and now things are working.

I'm surmising that our corruption detection is insufficient when there's (dynamic) corruption in the ls-archive file logic.

I'll attach the integrity check runs as separate attachments.

Flags: needinfo?(jvarga)

I'm making this block the shipping meta bug and adding a see also to the similar but different legacy LS variant of this bug, bug 1341070.

Blocks: 1599979
See Also: → 1341070
See Also: → 1704440
See Also: → 1590640

Assuming that this happens here, I have difficulties to read what is actually happening in the NS_ERROR_FILE_CORRUPTED case now.

From what I see in QM_TRY telemetry we now classify this as a warning:

Warning stacks:

Clients Sessions Hits Anchor Stack
5 6 6 dom/quota/ActorsParent.cpp:QuotaManager::EnsureStorageIsInitialized dom/quota/ActorsParent.cpp#6002:NS_ERROR_FILE_CORRUPTED

Not sure if this is what we wanted or if this should be an error. The only cases I see are on beta, FWIW.

The other errors at the same function in bug 1704433 have different error codes, it seems.

We don't see it in telemetry because this is not part of storage/temporary storage initialization, it's a problem in LSNG when it tries to migrate data from the old archived database to a new per origin database.
This is something we haven't done yet, I mean we've been focusing only on storage/temporary storage failures in telemetry. It's probably time to add more contexts and get data for this. I'll try to add a new context for this.

Flags: needinfo?(jvarga)

We can get NS_ERROR_FILE_CORRUPTED on this line. There are other Execute calls below which can fail with the same error as well.

(In reply to Jan Varga [:janv] from comment #6)

We can get NS_ERROR_FILE_CORRUPTED on this line. There are other Execute calls below which can fail with the same error as well.

Thanks, then I was far off...

(In reply to Jan Varga [:janv] from comment #5)

I'll try to add a new context for this.

Yes, that seems useful.

Hi Jan, I assume the patch to add the context for having telemetry is trivial, can you do this asap? Thanks!

Flags: needinfo?(jvarga)

The shadow writes (writes to webappsstore.sqlite) have been disabled long time ago and LSNG has been enabled by default long time ago as well, so I think we can start planning a removal of the archive database. I believe enough time has passed, so any data which was supposed to be migrated from the archive has been migrated in the end.

Flags: needinfo?(jvarga)
You need to log in before you can comment on or make changes to this bug.