Explain why tscrollx is slower on try (compared to m-c) on linux64-shippable-qr
Categories
(Testing :: Talos, defect, P3)
Tracking
(Not tracked)
People
(Reporter: kats, Unassigned)
References
Details
(Whiteboard: [perf:workflow])
If you just look at tscrollx on linux64-shippable-qr over the last 14 days of try and m-c, you'd expect them to be pretty close, since probably only a few of the try pushes actually affect the test. And yet when you look at the graph above there's a pretty persistent gap between the two. This is quite confusing to me as I don't see why this should be the case. I looked at a few other tests (e.g. tp5o_scroll, displaylist_mutate) and none of those show this difference.
It certainly makes it annoying when doing try pushes to determine if a particular set of changes will regress performance, because it gives false positives if you don't account for the fact that tscrollx on try is consistently worse already.
Comment 1•4 years ago
|
||
I think this is one of those mozilla-central-only things that is causing this (which needs fixing). We've had bugs in the past that were only reproducible on m-c for some reason and we still don't understand why.
Comment 2•4 years ago
|
||
We'll look into this as part of the perf workflow work. Perhaps there's something we can do to make it simpler to do performance testing on try (new feature in ./mach try maybe).
Reporter | ||
Comment 3•4 years ago
|
||
tscrollx appears to not be running anymore, since around Aug 6. Do you know if that's intentional?
Reporter | ||
Comment 4•4 years ago
|
||
Hm, looks like it happened in https://bugzilla.mozilla.org/show_bug.cgi?id=1651311 due to high frequency failures on tart. I think it would have been better to just disable tart instead of the entire svgr suite, but I don't know if the failures would have just moved to a different test.
Comment 5•4 years ago
|
||
:kats, if you want you can test the removal of just the tart test on try and if it looks good we could re-enable it with just that test removed.
Reporter | ||
Comment 6•4 years ago
|
||
Yup, I'll see if I can get that done. Moving discussion over to bug 1651311 to avoid the distraction on this bug.
Comment 7•4 years ago
|
||
For another example of unexpectedly-inconsistent results: https://treeherder.mozilla.org/perfherder/graphs?series=mozilla-central,2819175,1,1&series=autoland,2814048,1,1&timerange=1209600 shows a11yr opt e10s stylo webrender
on linux64-shippable-qr
giving results that are consistently around 5% better on autoland than on central.
Comment 8•3 years ago
|
||
I suspect investigating this will consume a lot of time. I'd be interested in exploring what we're trying to achieve by comparing between try/autoland/mozilla-central? A more reliable comparison would be to push multiple patches to try and use the Perfherder compare view. I imagine we could do more to promote and document this workflow. Keeping this open but dropping priority. Please feel free to challenge my thoughts here.
Description
•