Open Bug 1654824 Opened 4 years ago Updated 3 years ago

Explain why tscrollx is slower on try (compared to m-c) on linux64-shippable-qr

Categories

(Testing :: Talos, defect, P3)

defect

Tracking

(Not tracked)

People

(Reporter: kats, Unassigned)

References

Details

(Whiteboard: [perf:workflow])

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=mozilla-central,1939354,1,1&series=try,1913247,1,1&timerange=1209600

If you just look at tscrollx on linux64-shippable-qr over the last 14 days of try and m-c, you'd expect them to be pretty close, since probably only a few of the try pushes actually affect the test. And yet when you look at the graph above there's a pretty persistent gap between the two. This is quite confusing to me as I don't see why this should be the case. I looked at a few other tests (e.g. tp5o_scroll, displaylist_mutate) and none of those show this difference.

It certainly makes it annoying when doing try pushes to determine if a particular set of changes will regress performance, because it gives false positives if you don't account for the fact that tscrollx on try is consistently worse already.

I think this is one of those mozilla-central-only things that is causing this (which needs fixing). We've had bugs in the past that were only reproducible on m-c for some reason and we still don't understand why.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [perftest:triage]

We'll look into this as part of the perf workflow work. Perhaps there's something we can do to make it simpler to do performance testing on try (new feature in ./mach try maybe).

Whiteboard: [perftest:triage] → [perf:workflow]

tscrollx appears to not be running anymore, since around Aug 6. Do you know if that's intentional?

Flags: needinfo?(jmaher)

Hm, looks like it happened in https://bugzilla.mozilla.org/show_bug.cgi?id=1651311 due to high frequency failures on tart. I think it would have been better to just disable tart instead of the entire svgr suite, but I don't know if the failures would have just moved to a different test.

Flags: needinfo?(jmaher)

:kats, if you want you can test the removal of just the tart test on try and if it looks good we could re-enable it with just that test removed.

Flags: needinfo?(kats)

Yup, I'll see if I can get that done. Moving discussion over to bug 1651311 to avoid the distraction on this bug.

Flags: needinfo?(kats)

For another example of unexpectedly-inconsistent results: https://treeherder.mozilla.org/perfherder/graphs?series=mozilla-central,2819175,1,1&series=autoland,2814048,1,1&timerange=1209600 shows a11yr opt e10s stylo webrender on linux64-shippable-qr giving results that are consistently around 5% better on autoland than on central.

I suspect investigating this will consume a lot of time. I'd be interested in exploring what we're trying to achieve by comparing between try/autoland/mozilla-central? A more reliable comparison would be to push multiple patches to try and use the Perfherder compare view. I imagine we could do more to promote and document this workflow. Keeping this open but dropping priority. Please feel free to challenge my thoughts here.

Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.