The Language of Gutenberg

The following was co-presented at WordCamp Zaragoza with Matías Ventura and rehashed for an informal presentation with Jorge Costa on the state of Gutenberg with the local Oporto WordPress community.

Introduction

Gutenberg is the future all-encompassing editor of WordPress, starting with WordPress 5.0. This future promises more than just a replacement of the current TinyMCE-powered editor; rather, it seeks to redefine the term “content” beyond the implementation details of WordPress. In short, the end-goal of Gutenberg is to edit anything from a post’s content (post_content) to all the elements visually surrounding it on a page, and indeed edit the templates that a website comprises.

intro-gutenberg — Above all, an editor for rich content

In order to achieve all this, Gutenberg introduces the concept of block as the primary unit of content and customization. On the surface, blocks are all but new. Unsurprisingly, well known kinds of content translate to blocks: a paragraph, an image, a button. The reader unfamiliar with Gutenberg as a product is advised to watch the excellent walkthrough from the 2017 State of the Word:

This presentation aims to explain the introduction of blocks as a natural progression of WordPress given its history, goals, and principles; it aims to present blocks as more than a post hoc add-on to WordPress and instead as a conceptual and technical paradigm shifter.

By leaning on analogy and abstract reasoning, the following intends to be accessible to non-technical readers.

History

There is a long history of rich content in WordPress that reaches far beyond pure text. From the introduction of the TinyMCE WYSIWYG editor in 2.0 (2005), there has arguably been the notion that content is only bound by HTML. Later, the project would see the inclusion of widgets (plugin, 2006) and shortcodes (2.5, 2008), both now staples of WordPress content management.

More interesting, perhaps, is the thread of post formats. Functional support for them was introduced in 3.1 (2011), meaning that themes could use them to render content differently—e.g., a “image”-type entry would perhaps be rendered with no title—and that plugin developers were free to experiment in other directions. However, WordPress was eyeing a farther target, and in 2013 the project was prepared to include a new Post Formats UI in its upcoming release (3.6).

post-formats-new — The Post Formats UI in 2013

Despite the immediate Tumblr parallels, this effort was about proposing a new way of thinking about how content is created, and not just rendered, based on what it is. It would complement that year’s default WordPress theme, Twenty Thirteen, a remarkable showcase of what post formats could mean for themes. In the end, the UI never made it to core, and post formats never blossomed into a more powerful solution to content semantics.

Principles

From the start, certain principles could be established for Gutenberg. They naturally align with the philosophy of WordPress and its emphasis on openness and honoring content foremost.

Backwards compatibility and graceful degradation. One’s existing content should never be lost or generally affected by the switch to a new editor. The newer tool should understand the content natively or at the very least leave it undisturbed.
Portability. One’s content shouldn’t be tied to anything. This means that content shouldn’t be tied to any runtime, be it Gutenberg or WordPress in general, and that a good relationship with other editors (mobile apps, MarsEdit, etc.) should be developed.
No commitment when adopting Gutenberg. This is a corollary of the previous point. Adopting Gutenberg for early testing or curiosity shouldn’t be an irreversible action for one’s content. Disabling Gutenberg shouldn’t result in unreasonably altered content. The same should later apply to third-party blocks. There is a strong desire to fight lock-in, which is not new but is pervasive in a world of proprietary software and software-as-a-service.
Incremental development. Gutenberg aims to fundamentally reshape WordPress, and this cannot happen overnight when WordPress powers 30% of the Web and millions of users. On the subject, I strongly recommend my colleague Matías Ventura’s The Ship of Theseus.

HTML is the format

Considering these principles, some parallels emerged with HTML:

HTML is the format of the Web, and WordPress is the Web.
It is open, portable, and arguably one of the digital formats most likely to endure. It is human-readable and understood in virtually all programming ecosystems.
It is semantic, meant to describe documents with versatility, thus focusing on content for the benefit of the user.

HTML has come a long way in terms of expressive power. Now, modern HTML documents can really be about the user’s content, unencumbered by presentation details or dynamic behaviors, which are delegated to increasingly capable complementary technologies such as CSS and JavaScript. With some caveats, this could make HTML itself the best candidate for the format powering the new semantic block-based editor.

Ultimately, choosing HTML means that — as with a painting or a sculpture — the editor’s final artefact is the canonical format of the content, not a byproduct thereof.

If a little simile is allowed, consider the printing press. In letterpress, a finished page is assembled from individual characters, a test print made in a galley, and then locked into a chase to create a fully formed page. Once printed, there’s no need to know whether it was set via individual letters, type slugs from a linotype machine, or even one giant plate.

This is true for content blocks. They are the way in which the user creates their content, but they no longer matter once the content is finished. That is, until it needs to be edited. Imagine if the printing press were able to print a page while also including in the page the instructions to regenerate the set of movable type required to print it. What we are doing with blocks could be compared to printing invisible marks in the margins so that the printer can make adjustments to an already printed page without needing to set the page again from scratch.

HTML Comments

Enter HTML comments. Indeed, these are the native invisible marks of HTML: understood by all browsers and other user agents, and ignored by default.

Their introduction allows the problem of content recognition to be broken down into two stages. In this layered approach, the first stage is to identify at a glance what is what. This is where HTML comments are particularly helpful: here, a paragraph; there, a display of a site’s latest posts. Any work beyond this essential recognition is delegated to stage two, which is block-specific analysis.

Equally importantly, by using of HTML comments, the structure of content is never affected. Contrast with the scenario where any block markup would have to be encapsulated in <div> tags or be constrained to a single top-level aggregator element. Comment-demarcated blocks determine their own structure as long as it is valid HTML.

Blocks: Higher-order than HTML

I’ve alluded to a caveat of HTML. Indeed, HTML is semantic, but it has limited expressive power. It can express overall structure — body, sections, paragraphs — but can’t tell a citation from a pull-quote. Fortunately, the standard does evolve: HTML5 introduces the concept of figure and its accessory caption. Still, in the end, no generic HTML will be enough to express the nuance between a film poster and an actor’s portrait, or any other domain-specific language.

Blocks carry arbitrary meaning — poster, portrait, haiku, koan — and thus enhance HTML. Since blocks as reified by HTML comments, they may be thought of as meta-HTML. In the end, they are rendered as pure HTML for the page visitor and the demarcations disappear (they are currently even stripped out by the WordPress server); different blocks may yield the same markup — as would be the case between film poster and actor portrait blocks — but the content editor and the presentation engine both know the distinct nature of each block.

Block Attributes

Content sources

The essential meaning of a block is enhanced by attributes. For instance, a poster’s attributes may be: its image source; the film it represents; and whether it is a featured work, in which case the poster will be displayed more prominently (via wider alignment and/or by becoming a post’s featured image). Thus, the notion of attribute encompasses content, metadata, and presentation preferences.

Attributes may be sourced differently. To simplify, one of the sources is the block’s content: in the case of a paragraph block, the attribute text would be the markup inside the <p> tag of the block. Another source is the block’s opening HTML comment, useful to encode data that may be cumbersome to source from content: in the case of a paragraph block, the attribute dropCap is a boolean recording whether the rendered paragraph should be led by a drop cap. See the Gutenberg handbook for a complete reference of attribute sources.

drop-cap — Toggling the drop cap attribute

Comment attributes are stored as a JSON object:

<!-- wp:paragraph {"dropCap":true} -->
<p class="has-drop-cap">Si quelqu'un se propose comme problème …</p>
<!-- /wp:paragraph -->

Note that the content also bears the mark of the drop cap setting (class="has-drop-cap") and there is redundancy of information. Indeed, once the attribute is toggled, the editor updates the content of the paragraph block so that a visual change can be introduced. However, a benefit of comment attributes is the ability to more confidently migrate to a different content format (e.g., adopting a different class name, or switching to a hypothetical new HTML construct like <p drop-cap>…</p>) by first reading the comment data and then validating the content.

Dynamic blocks

Looking at the figure in § HTML Comments once more, there is a block in the middle that stands out, as it seemingly has no content:

<!-- wp:latest-posts {"categories":"1","postsToShow":3,"layout":"grid"} /-->

The Latest Posts block is dynamic by nature, displaying a site’s most recent posts upon page request. Its attributes determine which posts to show (categories), how many (postsToShow), and whether as a list or a grid (layout). It should be obvious that it doesn’t need static content within its boundaries.

A seasoned WordPress artisan will notice that, in this scenario, the block is very much equivalent to a shortcode:

[latestposts categories="1" postsToShow="3" layout="grid"]

The main difference is that dynamic blocks are entirely equal to any other block within the editing context — whereas stock shortcodes offer no rich-editing mode — and in storage. The sole difference is that, during front-end page render, dynamic blocks define a render_callback whose output the server will use as block content. One very interesting consequence of this equivalency is that dynamic blocks may, in fact, have static content:

<!-- wp:latest-posts {"categories":"1","postsToShow":3,"layout":"grid"} -->
https://example.org/category/random/feed/
<!-- /wp:latest-posts -->

In this example iteration of Latest Posts, the content is the URL for the RSS feed of the corresponding category view as specified in the attributes. When would this be of any use? In any context in which post_content rendering is impractical or undesired. For instance, in newsletters: they are offline channels of communication for which a soon-to-be-outdated list of recent posts might make little sense. By comparison, shortcodes offer no such mechanism, and are notorious for making their way into newsletters unprocessed.

Beyond the content

Blocks aren’t limited to the aforementioned attribute sources. Notably, attributes may be sourced in post meta, thus allowing blocks to automatically operate within the context of the post they are dropped into. Meta support is crucial to help authors transition from the current era of WordPress based on CPTs, custom fields and metaboxes, to an entirely block-based paradigm.

Other sources may be implemented, whether in Gutenberg core or by way of extensions. A source expected in 2018 is site options, paving the way for blocks such as Site Title and full-page customization.

block-logic-flow — The logic flow of the editor (How Little Blocks Work, May 2017)

Foundation

In the end, what is here referred to as the language of Gutenberg became very early on a conceptual foundation for the editor.

This has allowed development to take place in layers; the parsing stack is itself layered, with block recognition performed by a parser defined by a formal grammar, and attribute sourcing later performed according to each block type’s specification; the editing logic then rests on the block-filled application state derived from serialization; etc.

More interestingly, the foundation has proven to be robust enough to accommodate scenarios, some of which may have not been foreseen or would have been deferred to a different phase of the project, well after WordPress 5.0. Examples include block nesting, content validation, hybrid (static + dynamic) blocks, reusable blocks across posts, the ability to evolve with the HTML standards. This, aside from the original goals, which include backwards compatibility and the ability to test and then uninstall Gutenberg with no content loss.

Built as its own component, there’s also room for hot-swapping — e.g., adopt a new parser, or altogether change the persistence model if it were so justified.

Opportunities

By thinking in terms of blocks, the editor can reason more deeply about content.

An example of this is post-format auto-detection. Indeed, simple blueprints or heuristics can be defined with which the editor can suggest a post format to the user: if the user starts a new post and adds a video block with a little paragraph block underneath, they will see a suggestion to set the post’s format to video.

More valuable opportunities arise from the shift in granularity — from post to block — that comes with Gutenberg. Take real-time collaboration, quickly achievable with emerging technologies such as WebRTC with no need for sophisticated concurrency control if we simply move post-level locking to block-level locking: with that feature, two users could share an editing session and see each other’s changes, with the sole restriction that they work on different blocks at any given time. Further, granularity also favors incremental adoption: even when working in a longer pre-Gutenberg post, blocks may be added in the middle of “classic” (i.e., TinyMCE-wrapped) content.

Finally, most blocks are by no means bound to the post. A hypothetical post author block is necessarily bound, but a paragraph is just a paragraph, and this holds for images or layout blocks (e.g., columns). Thus, blocks have their place in a broader context: they belong in page-layout editing or full site customization, replacing many widgets, for instance. Indeed, Gutenberg’s next stages of development, after 5.0, envision expanding beyond post content thanks to those agnostic blocks, as well as new context-specific blocks, such as site title, header menu, etc.

Coda

The fact that the language of the new editor tends to be overlooked is a great testament to its merits. It sits happily at the bottom of the application, joining the great content-agnostic agent that is WordPress — and its filterable life cycle — together with the rich semantic editing experience we are building. Only seldom does the core team’s work overlap with it.

Author: Miguel Fonseca

Engineer at Automattic. Linguist, cyclist, Lindy Hopper, tree climber, and headbanger. View all posts by Miguel Fonseca