Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Markdown] Decide what to do about class="summary" and class="seoSummary" #3923

Closed
wbamberg opened this issue Apr 7, 2021 · 12 comments
Closed
Labels
needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened.

Comments

@wbamberg
Copy link
Collaborator

wbamberg commented Apr 7, 2021

seoSummary and summary are classes that get applied to prose content in MDN pages. They're used to help calculate a thing we'll call a "page summary". The calculation is a bit complicated (https://github.com/mdn/yari/blob/main/kumascript/src/info.js#L202), but in general, it's something like:

  • if span.seoSummary exists in the page, then its content is taken as the page summary
  • otherwise, if any .summary element exists in the page, then its content is taken as the page summary
  • otherwise, the summary is taken from the first p element in the document that does not live in a note or warning div.

So essentially these classes are used to provide a custom page summary, and if they are not given we default to using the first paragraph.

What's the page summary used for?

It's used in (at least) a couple of ways:

  • as described in https://developer.mozilla.org/en-US/docs/MDN/Guidelines/CSS_style_guide#.seosummary (although the .summary part is not documented) it gets used as the value of a <meta> tag in the page <head>, to provide a description that is embedded in search engine results and other embedding contexts.

  • it's available to KumaScript, and seems to be used in the following macros:

    • InterfaceOverview
    • Index
    • HTMLRef
    • GlossaryList
    • GlossaryDisambiguation
    • HTMLRefTable
    • LearnBox
    • SubpagesWithSummaries
    • LandingPageListSubpages

How widely are summary and seoSummary used?

Across all en-US, I count:

  • 1322 uses of the summary class
  • 2285 uses of the seoSummary class

Across just JavaScript:

  • 24 uses of the summary class
  • 105 uses of the seoSummary class

How should we handle them in Markdown-land?

The most obvious option is: stop supporting them, and just always use the first non-note paragraph for the page summary. It would be good to know if we would be regressing many pages by doing this.

Many pages just mark up the first paragraph with summary or seoSummary, so removing those classes would have no actual impact. For example:

<p class="summary">Learning some HTML, CSS, and JavaScript is useful if you want to become a web developer. Beyond mechanical use, it's important to learn how to use these technologies <strong>responsibly</strong> so that all readers might use your creations on the web. To help you achieve this, this module will cover general best practices (which are demonstrated throughout the <a href="/en-US/docs/Learn/HTML">HTML</a>, <a href="/en-US/docs/Learn/CSS">CSS</a>, and <a href="/en-US/docs/Learn/JavaScript">JavaScript</a> topics), <a href="/en-US/docs/Learn/Tools_and_testing/Cross_browser_testing">cross browser testing</a>, and some tips on enforcing accessibility from the start. We'll cover accessibility in special detail.</p>

However, some actually do use it to override the default:

<p><span class="seoSummary">This guide introduces the video codecs you're most likely to encounter or consider using on the web, summaries of their capabilities and any compatibility and utility concerns, and advice to help you choose the right codec for your project's video.</span></p>

It would I think be worthwhile to:

  • analyse how many of the thousands of uses of these classes are actually no-ops
  • remove the no-ops
  • make a plan for the remainder.

We should also do an analysis of how KS macros are using the page summary, and whether they would be broken by removing the ability to override the default.

If we did decide it's important to let pages override the "first paragraph rule", we could consider having an optional front matter key for this, and we might also consider whether this is the same as the "short description" that we've talked about before as a component of our pages.

For now, we could limit this exercise to the JavaScript docs.

( @escattone , we talked about this issue yesterday)

@wbamberg wbamberg added the needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened. label Apr 7, 2021
@wbamberg
Copy link
Collaborator Author

wbamberg commented Apr 7, 2021

I've done a bit more analysis of the JS docs here.

Usage of seoSummary in JavaScript

Of 105 uses of seoSummary:

  • 76 match the first paragraph, so the class is a no-op, and as far as I can tell we could just remove the class and see no effect
  • in 22 cases the seoSummary is used to truncate the first paragraph, selecting the first part of it.
  • leaving 5 cases where the seoSummary is selecting a different text entirely

Using seoSummary to truncate the first paragraph is an interesting case, if we want to limit the size of the page summary. We might want to say the page summary is just the first sentence, as we did for short descriptions? Note also though that in many of the 78 pages that are marking up the whole paragraph with seoSummary are thereby generating long summaries, such as:

<p><span class="seoSummary">The <strong><code>export</code></strong> statement is used
when creating JavaScript modules to export live bindings to functions, objects, or
primitive values from the module so they can be used by other programs with the
{{jsxref("Statements/import", "import")}} statement. Bindings that are exported can
still be modified locally; when imported, although they can only be read by the
importing module the value updates whenever it is updated by the exporting
module.</span></p>

So it's not like we are doing this consistently at all, at the moment.

Usage of summary in JavaScript

Of 22 uses of summary:

<p class="summary">Thus from the above example of class <code>C</code> and <code>D</code>,
it seems that <code>new.target</code> points to the class definition of class which is
initialized. For example, when <code>d</code> was initialized using
<code>new D()</code>, the class definition of <code>D</code> was printed; and similarly,
in case of <code>c</code>, the class <code>C</code> was printed.</p>

(but note that in this case the bit marked up by the summary is not the bit in the <meta> tag: view-source:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/new.target. So there's something I don't understand here about how these <meta> tags are built...)

Next steps

I'd say next steps for JS would be:

  • remove all the uses of summary and seoSummary that are no-ops
  • decide whether the use of these classes to truncate the first paragraph is worth it
  • deal with the few cases where a quite different text is selected, perhaps by changing the content.
@wbamberg wbamberg added this to To do in MDN to Markdown Apr 7, 2021
@Ryuno-Ki
Copy link
Collaborator

Ryuno-Ki commented Apr 7, 2021

It might make sense to share the scripts you've used for the analysis, so others could easier replicate the results (or look for flawed logic if something slipped your attention).

@wbamberg
Copy link
Collaborator Author

wbamberg commented Apr 7, 2021

They're pretty rough, and just variations of something like https://gist.github.com/wbamberg/20d0d7a99f38ea9c188b6003a653661c. They just find ".html" files, scrape off the front matter, then use jsdom on the result. Then it's easy to query the doc for things like classes. To check whether the node identified by seoSummary matched the first paragraph I looked at the textContent.

I did do a bunch of manual checking the sources, to see if it made sense. But I'd be very happy to accept improvement suggestions!

@chrisdavidmills
Copy link
Contributor

I agree with your approach here, @wbamberg . the summary class is no longer documented, I believe because it is no longer used for anything? I'm sure I remember removing the documentation because I was told it was no longer used...anyway, I think we can totally deal with this by just making the first paragraph the summary, and rewriting cases where it is a bit long or otherwise doesn't work for this purpose.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Apr 8, 2021

I believe because it is no longer used for anything?

It definitely seems to be used here: kumascript/src/info.js and I have tested that it can affect the value of the <meta> tag. But yeah, I think we are in agreement about the approach to take here.

@Ryuno-Ki
Copy link
Collaborator

So if there is an agreement … what would be the next steps?
It feels like this issue is „done” and some others should be created with concrete steps?

At least one for documentation.
Perhaps another one for cleaning existing pages …
How to spot pages where the paragraph is not apt?

@wbamberg
Copy link
Collaborator Author

wbamberg commented Apr 12, 2021

Yes, there are a couple of pieces of this.

  • agreement on where we want to end up: it seems we do have agreement on this, so we can update the "MDN Markdown" spec to talk about the summary
  • agreement on how to get there from here. The rest of this comment is about that :).

For JS: the immediate goal is to convert the JS docs. For JS this is small enough that we can just go through removing all the summary and seoSummary classes. I filed #4042 for this.

For everything else: I think we can choose:

  • just remove the summary and seoSummary classes in the conversion, accept that this will break a few pages, and deal with it as bug fixes
  • spend the time checking all the pages that would be affected and fixing them if necessary
  • take a sample of say 100 pages that would be affected, and then deciding if it's worth fixing all of them.

I did a bit more digging, and made this table:

  number of pages no summary summary matches summary is substring summary is different % not a no-op
games 73 6 31 4 31 48
glossary 538 441 44 52 1 10
learn 362 142 215 5 34 11
mdn 88 36 26 10 16 30
mozilla 1307 1203 85 8 6 1
plugins 27 16 11 0 0 0
related 13 0 7 0 6 46
tools 154 130 14 0 10 6
web/api 5926 3998 1311 534 83 10
web/css 946 729 143 67 7 8
web/html 239 74 83 75 7 34
web/http 292 229 46 13 4 6
web/javascript 922 796 100 23 3 3
web/svg 382 326 35 20 1 5
web/other 491 290 135 31 34 13
             
Totals 11760 8416 2286 842 243 9

(https://docs.google.com/spreadsheets/d/16iSsUHpdDgOQnb1ndBYfvUKJdDFjOse2ONDmtr_nlDo/edit?usp=sharing). The numbers are probably a bit rough - there is some heuristics in deciding which is the "first paragraph" which is supposed to roughly match up with https://github.com/mdn/yari/blob/main/kumascript/src/info.js#L202. But it is close enough I think.

This has 5 columns:

  1. number of pages: total number of pages
  2. no summary: number of pages with no summary or seoSummary
  3. summary matches: number of pages where the summary or seoSummary matches the first paragraph anyway (so is a no-op)
  4. summary is substring: number of pages where summary or seoSummary selects the first part of the first paragraph
  5. summary is different: number of pages where summary or seoSummary selects a completely different text.

Cases 4 and 5 are those where these classes have an effect, and this tells us that it's about 1000 pages, or a little under 10%.

My experience of MDN is that almost all first paragraphs are short, so I think in general breaking the "substring" cases is not likely to make the docs much worse.

If we did want to look at all these, it would be easy to generate the whole list of pages that would need looking at, and file a bug for it. And it would be quite easy for different people to work on this in parallel, although there would be some overhead of managing things so people didn't tread on each others' toes. It does take some careful thought sometimes in how to rework the pages, it's not quite mechanical.

@hamishwillee
Copy link
Collaborator

hamishwillee commented Apr 12, 2021

@wbamberg Great analysis. I think that the "summary is different" cases are likely to need some work, and IMO it might be a good idea to generate lists of those and start looking at them before switching over (in an ideal world a list showing both the current and resulting text for each title).

FWIW I prefer docs where the first sentence/para is a "summary". Usually as a reader it aids in quickly determining relevance (though as a writer it can force you to write "unnaturally").

P.S. This might be a good task for our bigger volunteer pool. I am also happy to help.

@wbamberg
Copy link
Collaborator Author

@wbamberg Great analysis. I think that the "summary is different" cases are likely to need some work, and IMO it might be a good idea to generate lists of those and start looking at them before switching over (in an ideal world a list showing both the current and resulting text for each title).

FWIW I prefer docs where the first sentence/para is a "summary". Usually as a reader it aids in quickly determining relevance (though as a writer it can force you to write "unnaturally").

Yes, me too.

P.S. This might be a good task for our bigger volunteer pool. I am also happy to help.

OK, then I think I will start by filing an issue like this for the JS docs, which is our first priority, and maybe we can get help. Very happy for you to help either by fixing docs and/or reviewing and helping to guide volunteers. I'll try to get a detailed list out tomorrow.

@chrisdavidmills
Copy link
Contributor

All sounds good to me.

I do wonder whether it is worth creating a sample of the titles and summaries for 100 sample pages that have had these classes removed, and seeing how many of the summaries read weirdly or wrongly as a result, to give us an idea of the impact of going with @wbamberg 's first option (just remove the summary and seoSummary classes in the conversion, accept that this will break a few pages, and deal with it as bug fixes).

@hamishwillee
Copy link
Collaborator

FWIW I had a look at the "type 4 fixes" fixed in #4085 using the ListSummaries macro and the manually fixed ones look a lot better better than they would have otherwise, and more useful/succinct than their unchanged peers.

@wbamberg
Copy link
Collaborator Author

I've updated #3350 (comment) with converter guidance for summary and seoSummary:

  • if the first prose paragraph of the document matches the text selected by summary and seoSummary, just strip out those attributes, otherwise raise an error.

This will not be a problem for the JS docs but we will have content work to do in the others.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs triage Triage needed by staff and/or partners. Automatically applied when an issue is opened.
4 participants