Tags: microformats

196

Tuesday, December 12th, 2023

Apple Annie’s Weblog · Diving into indiewebify.me & microformats, a series

Here’s a good walkthrough of adding microformats to your site, starting with h-card and moving on to h-entry.

Tuesday, October 22nd, 2019

Saturday, September 21st, 2019

Going offline with microformats

For the offline page on my website, I’ve been using a mixture of the Cache API and the localStorage API. My service worker script uses the Cache API to store copies of pages for offline retrieval. But I used the localStorage API to store metadata about the page—title, description, and so on. Then, my offline page would rifle through the pages stored in a cache, and retreive the corresponding metadata from localStorage.

It all worked fine, but as soon as I read Remy’s post about the forehead-slappingly brilliant technique he’s using, I knew I’d be switching my code over. Instead of using localStorage—or any other browser API—to store and retrieve metadata, he uses the pages themselves! Using the Cache API, you can examine the contents of the pages you’ve stored, and get at whatever information you need:

I realised I didn’t need to store anything. HTML is the API.

Refactoring the code for my offline page felt good for a couple of reasons. First of all, I was able to remove a dependency—localStorage—and simplify the JavaScript. That always feels good. But the other reason for the warm fuzzies is that I was able to use data instead of metadata.

Many years ago, Cory Doctorow wrote a piece called Metacrap. In it, he enumerates the many issues with metadata—data about data. The source of many problems is when the metadata is stored separately from the data it describes. The data may get updated, without a corresponding update happening to the metadata. Metadata tends to rot because it’s invisible—out of sight and out of mind.

In fact, that’s always been at the heart of one of the core principles behind microformats. Instead of duplicating information—once as data and again as metadata—repurpose the visible data; mark it up so its meta-information is directly attached to the information itself.

So if you have a person’s contact details on a web page, rather than repeating that information somewhere else—in the head of the document, say—you could instead attach some kind of marker to indicate which bits of the visible information are contact details. In the case of microformats, that’s done with class attributes. You can mark up a page that already has your contact information with classes from the h-card microformat.

Here on my website, I’ve marked up my blog posts, articles, and links using the h-entry microformat. These classes explicitly mark up the content to say “this is the title”, “this is the content”, and so on. This makes it easier for other people to repurpose my content. If, for example, I reply to a post on someone else’s website, and ping them with a webmention, they can retrieve my post and know which bit is the title, which bit is the content, and so on.

When I read Remy’s post about using the Cache API to retrieve information directly from cached pages, I knew I wouldn’t have to do much work. Because all of my posts are already marked up with h-entry classes, I could use those hooks to create a nice offline page.

The markup for my offline page looks like this:

<h1>Offline</h1>
<p>Sorry. It looks like the network connection isn’t working right now.</p>
<div id="history">
</div>

I’ll populate that “history” div with information from a cache called “pages” that I’ve created using the Cache API in my service worker.

I’m going to use async/await to do this because there are lots of steps that rely on the completion of the step before. “Open this cache, then get the keys of that cache, then loop through the pages, then…” All of those thens would lead to some serious indentation without async/await.

All async functions have to have a name—no anonymous async functions allowed. I’m calling this one listPages, just like Remy is doing. I’m making the listPages function execute immediately:

(async function listPages() {
...
})();

Now for the code to go inside that immediately-invoked function.

I create an array called browsingHistory that I’ll populate with the data I’ll use for that “history” div.

const browsingHistory = [];

I’m going to be parsing web pages later on, so I’m going to need a DOM parser. I give it the imaginative name of …parser.

const parser = new DOMParser();

Time to open up my “pages” cache. This is the first await statement. When the cache is opened, this promise will resolve and I’ll have access to this cache using the variable …cache (again with the imaginative naming).

const cache = await caches.open('pages');

Now I get the keys of the cache—that’s a list of all the page requests in there. This is the second await. Once the keys have been retrieved, I’ll have a variable that’s got a list of all those pages. You’ll never guess what I’m calling the variable that stores the keys of the cache. That’s right …keys!

const keys = await cache.keys();

Time to get looping. I’m getting each request in the list of keys using a for/of loop:

for (const request of keys) {
...
}

Inside the loop, I pull the page out of the cache using the match() method of the Cache API. I’ll store what I get back in a variable called response. As with everything involving the Cache API, this is asynchronous so I need to use the await keyword here.

const response = await cache.match(request);

I’m not interested in the headers of the response. I’m specifically looking for the HTML itself. I can get at that using the text() method. Again, it’s asynchronous and I want this promise to resolve before doing anything else, so I use the await keyword. When the promise resolves, I’ll have a variable called html that contains the body of the response.

const html = await response.text();

Now I can use that DOM parser I created earlier. I’ve got a string of text in the html variable. I can generate a Document Object Model from that string using the parseFromString() method. This isn’t asynchronous so there’s no need for the await keyword.

const dom = parser.parseFromString(html, 'text/html');

Now I’ve got a DOM, which I have creatively stored in a variable called …dom.

I can poke at it using DOM methods like querySelector. I can test to see if this particular page has an h-entry on it by looking for an element with a class attribute containing the value “h-entry”:

if (dom.querySelector('.h-entry h1.p-name') {
...
}

In this particular case, I’m also checking to see if the h1 element of the page is the title of the h-entry. That’s so that index pages (like my home page) won’t get past this if statement.

Inside the if statement, I’m going to store the data I retrieve from the DOM. I’ll save the data into an object called …data!

const data = new Object;

Well, the first piece of data isn’t actually in the markup: it’s the URL of the page. I can get that from the request variable in my for loop.

data.url = request.url;

I’m going to store the timestamp for this h-entry. I can get that from the datetime attribute of the time element marked up with a class of dt-published.

data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));

While I’m at it, I’m going to grab the human-readable date from the innerText property of that same time.dt-published element.

data.published = dom.querySelector('.h-entry .dt-published').innerText;

The title of the h-entry is in the innerText of the element with a class of p-name.

data.title = dom.querySelector('.h-entry .p-name').innerText;

At this point, I am actually going to use some metacrap instead of the visible h-entry content. I don’t output a description of the post anywhere in the body of the page, but I do put it in the head in a meta element. I’ll grab that now.

data.description = dom.querySelector('meta[name="description"]').getAttribute('content');

Alright. I’ve got a URL, a timestamp, a publication date, a title, and a description, all retrieved from the HTML. I’ll stick all of that data into my browsingHistory array.

browsingHistory.push(data);

My if statement and my for/in loop are finished at this point. Here’s how the whole loop looks:

for (const request of keys) {
  const response = await cache.match(request);
  const html = await response.text();
  const dom = parser.parseFromString(html, 'text/html');
  if (dom.querySelector('.h-entry h1.p-name')) {
    const data = new Object;
    data.url = request.url;
    data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
    data.published = dom.querySelector('.h-entry .dt-published').innerText;
    data.title = dom.querySelector('.h-entry .p-name').innerText;
    data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
    browsingHistory.push(data);
  }
}

That’s the data collection part of the code. Now I’m going to take all that yummy information an output it onto the page.

First of all, I want to make sure that the browsingHistory array isn’t empty. There’s no point going any further if it is.

if (browsingHistory.length) {
...
}

Within this if statement, I can do what I want with the data I’ve put into the browsingHistory array.

I’m going to arrange the data by date published. I’m not sure if this is the right thing to do. Maybe it makes more sense to show the pages in the order in which you last visited them. I may end up removing this at some point, but for now, here’s how I sort the browsingHistory array according to the timestamp property of each item within it:

browsingHistory.sort( (a,b) => {
  return b.timestamp - a.timestamp;
});

Now I’m going to concatenate some strings. This is the string of HTML text that will eventually be put into the “history” div. I’m storing the markup in a string called …markup (my imagination knows no bounds).

let markup = '<p>But you still have something to read:</p>';

I’m going to add a chunk of markup for each item of data.

browsingHistory.forEach( data => {
  markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
});

With my markup assembled, I can now insert it into the “history” part of my offline page. I’m using the handy insertAdjacentHTML() method to do this.

document.getElementById('history').insertAdjacentHTML('beforeend', markup);

Here’s what my finished JavaScript looks like:

<script>
(async function listPages() {
  const browsingHistory = [];
  const parser = new DOMParser();
  const cache = await caches.open('pages');
  const keys = await cache.keys();
  for (const request of keys) {
    const response = await cache.match(request);
    const html = await response.text();
    const dom = parser.parseFromString(html, 'text/html');
    if (dom.querySelector('.h-entry h1.p-name')) {
      const data = new Object;
      data.url = request.url;
      data.timestamp = new Date(dom.querySelector('.h-entry .dt-published').getAttribute('datetime'));
      data.published = dom.querySelector('.h-entry .dt-published').innerText;
      data.title = dom.querySelector('.h-entry .p-name').innerText;
      data.description = dom.querySelector('meta[name="description"]').getAttribute('content');
      browsingHistory.push(data);
    }
  }
  if (browsingHistory.length) {
    browsingHistory.sort( (a,b) => {
      return b.timestamp - a.timestamp;
    });
    let markup = '<p>But you still have something to read:</p>';
    browsingHistory.forEach( data => {
      markup += `
<h2><a href="${ data.url }">${ data.title }</a></h2>
<p>${ data.description }</p>
<p class="meta">${ data.published }</p>
`;
    });
    document.getElementById('history').insertAdjacentHTML('beforeend', markup);
  }
})();
</script>

I’m pretty happy with that. It’s not too long but it’s still quite readable (I hope). It shows that the Cache API and the h-entry microformat are a match made in heaven.

If you’ve got an offline strategy for your website, and you’re using h-entry to mark up your content, feel free to use that code.

If you don’t have an offline strategy for your website, there’s a book for that.

Monday, December 3rd, 2018

Voxxed Thessaloniki 2018 - Opening Keynote - Taking Back The Web - YouTube

Here’s the talk I gave recently about indie web building blocks.

There’s fifteen minutes of Q&A starting around the 35 minute mark. People asked some great questions!

Saturday, November 10th, 2018

Webmentions at Indie Web Camp Berlin

I was in Berlin for most of last week, and every day was packed with activity:

By the time I got back to Brighton, my brain was full …just in time for FF Conf.

All of the events were very different, but equally enjoyable. It was also quite nice to just attend events without speaking at them.

Indie Web Camp Berlin was terrific. There was an excellent turnout, and once again, I found that the format was just right: a day of discussions (BarCamp style) followed by a day of doing (coding, designing, hacking). I got very inspired on the first day, so I was raring to go on the second.

What I like to do on the second day is try to complete two tasks; one that’s fairly straightforward, and one that’s a bit tougher. That way, when it comes time to demo at the end of the day, even if I haven’t managed to complete the tougher one, I’ll still be able to demo the simpler one.

In this case, the tougher one was also tricky to demo. It involved a lot of invisible behind-the-scenes plumbing. I was tweaking my webmention endpoint (stop sniggering—tweaking your endpoint is no laughing matter).

Up until now, I could handle straightforward webmentions, and I could handle updates (if I receive more than one webmention from the same link, I check it each time). But I needed to also handle deletions.

The spec is quite clear on this. A 404 isn’t enough to trigger a deletion—that might be a temporary state. But a status of 410 Gone indicates that a resource was once here but has since been deliberately removed. In that situation, any stored webmentions for that link should also be removed.

Anyway, I think I got it working, but it’s tricky to test and even trickier to demo. “Not to worry”, I thought, “I’ve always got my simpler task.”

For that, I chose to add a little map to my homepage showing the last location I published something from. I’ve been geotagging all my content for years (journal entries, notes, links, articles), but not really doing anything with that data. This is a first step to doing something interesting with many years of location data.

I’ve got it working now, but the demo gods really weren’t with me at Indie Web Camp. Both of my demos failed. The webmention demo failed quite embarrassingly.

As well as handling deletions, I also wanted to handle updates where a URL that once linked to a post of mine no longer does. Just to be clear, the URL still exists—it’s not 404 or 410—but it has been updated to remove the original link back to one of my posts. I know this sounds like another very theoretical situation, but I’ve actually got an example of it on my very first webmention test post from five years ago. Believe it or not, there’s an escort agency in Nottingham that’s using webmention as a vector for spam. They post something that does link to my test post, send a webmention, and then remove the link to my test post. I almost admire their dedication.

Still, I wanted to foil this particular situation so I thought I had updated my code to handle it. Alas, when it came time to demo this, I was using someone else’s computer, and in my attempt to right-click and copy the URL of the spam link …I accidentally triggered it. In front of a room full of people. It was midly NSFW, but more worryingly, a potential Code Of Conduct violation. I’m very sorry about that.

Apart from the humiliating demo, I thoroughly enjoyed Indie Web Camp, and I’m going to keep adjusting my webmention endpoint. There was a terrific discussion around the ethical implications of storing webmentions, led by Sebastian, based on his epic post from earlier this year.

We established early in the discussion that we weren’t going to try to solve legal questions—like GDPR “compliance”, which varies depending on which lawyer you talk to—but rather try to figure out what the right thing to do is.

Earlier that day, during the introductions, I quite happily showed webmentions in action on my site. I pointed out that my last blog post had received a response from another site, and because that response was marked up as an h-entry, I displayed it in full on my site. I thought this was all hunky-dory, but now this discussion around privacy made me question some inferences I was making:

  1. By receiving a webention in the first place, I was inferring a willingness for the link to be made public. That’s not necessarily true, as someone pointed out: a CMS could be automatically sending webmentions, which the author might be unaware of.
  2. If the linking post is marked up in h-entry, I was inferring a willingness for the content to be republished. Again, not necessarily true.

That second inferrence of mine—that publishing in a particular format somehow grants permissions—actually has an interesting precedent: Google AMP. Simply by including the Google AMP script on a web page, you are implicitly giving Google permission to store a complete copy of that page and serve it from their servers instead of sending people to your site. No terms and conditions. No checkbox ticked. No “I agree” button pressed.

Just sayin’.

Anyway, when it comes to my own processing of webmentions, I’m going to take some of the suggestions from the discussion on board. There are certain signals I could be looking for in the linking post:

  • Does it include a link to a licence?
  • Is there a restrictive robots.txt file?
  • Are there meta declarations that say noindex?

Each one of these could help to infer whether or not I should be publishing a webmention or not. I quickly realised that what we’re talking about here is an algorithm.

Despite its current usage to mean “magic”, an algorithm is a recipe. It’s a series of steps that contribute to a decision point. The problem is that, in the case of silos like Facebook or Instagram, the algorithms are secret (which probably contributes to their aura of magical thinking). If I’m going to write an algorithm that handles other people’s information, I don’t want to make that mistake. Whatever steps I end up codifying in my webmention endpoint, I’ll be sure to document them publicly.

Saturday, November 3rd, 2018

2018-11-03, 21:54 - sonniesedge.co.uk

Day one of Indie Web Camp Berlin is done, and it was great! Here’s Charlie’s recap of the sessions she attended.

Monday, August 20th, 2018

Playing with the Indieweb

A good half-hour presentation by Stephen Rushe on the building blocks of the indie web. You can watch the video or look through the slides.

I’ve recently been exploring the world of the IndieWeb, and owning my own content rather than being reliant on the continued existence of “silos” to maintain it. This has led me to discover the varied eco-system of IndieWeb, such as IndieAuth, Microformats, Micropub, Webmentions, Microsub, POSSE, and PESOS.

Sunday, July 1st, 2018

Monday, September 25th, 2017

The Decentralized Social Web

Excellent presentation slides on all things Indie Web.

Tuesday, July 18th, 2017

Reflections on Two Years of Indieweb

Alex Kearney looks back on two years of owning her own data.

With a fully functional site up and running, I focused on my own needs and developed features to support how I wanted to use my site. In hind-sight, that’s probably the most indie thing I could’ve done, and how I should’ve started my indieweb adventure.

This really resonates with me.

One of the motivating features for joining the indieweb was the ability to keep and curate the content I create over time.

Terrific post!

Here’s to two more years.

Sunday, June 18th, 2017

Microformats : Meaningful HTML

A great one-page intro to microformats (h-card in particular), complete with a parser that exports JSON. Bookmark this for future reference.

Friday, March 17th, 2017

Amber Wilson: Markup-Masterclass

Yesterday was a good day with Amber. She’s been marking up her CV and it was the perfect opportunity to take a deep dive into HTML.

Tuesday, June 28th, 2016

<A>

The opening keynote from the inaugural HTML Special held before CSS Day 2016 in Amsterdam.

It’s all starting to come together.

The world exploded into a whirling network of kinships, where everything pointed to everything else, everything explained everything else.
— Umberto Eco, Foucault’s Pendulum

A is for Apophenia

Apophenia is the name for that tendency in humans to see patterns where none exist, to draw connections, to make links.

Every conspiracy theory is an example of apophenia. But you don’t have to be a conspiracy theorist to experience it. We do it all the time. We see shapes in the constellations in the night sky. We see faces in just about everything.

Stars

Today, I would like to engage the apopheniac in you.

A is for Anchor

An anchor is an odd device to represent a link. I can even remember seeing anchor symbols used in the interfaces of rich text editors. If it wasn’t an anchor, it was a chain. I suppose that was meant to represent a link …because chains have links.

Anyway, why A? Why Anchor?

It goes back to how the A element was originally used. When I say originally, I mean originally. Let’s look at the first A element in the first web page at the first URL.

<A NAME=0 HREF="WhatIs.html">
hypermedia
</A>

This looks pretty familiar to us today. There’s an A Element with opening and closing tags, some text in between, and an HREF attribute for the destination. But there’s also a NAME attribute. This has since been deprecated—now we can just use an ID attribute on any element. The idea was that A elements could be used to create destinations for inbound links. They were, if you like, anchors within a page that other pages could tether themselves to. Each anchor is given a unique identifier (unique within the page, that is). Here, the identifier is simply the number zero, because this page was created by a programmer and in the mind of a programmer, counting begins with zero.

This use of the A element—using NAME attributes to create in-page anchors—never really took off. But the other attribute, the HREF attribute, that spread like wildfire. It’s short for hypertext reference, and in this particular instance, the reference is to another page in the same directory on the same server. It’s a page about hypertext.

Hypertext is text which is not constrained to be linear.

The term was first coined by Ted Nelson. He didn’t just talk about text either. He also coined the term hypermedia. He coined lots of interesting words. He talked about things being deeply intertwingled. He also coined the term teledildonics, but that is not directly relevant to hypertext or hypermedia.

If hypertext is text which is not constrained to be linear, how did we ever manage with good old-fashioned non-hyper text and non-hyper media? We used archives.

A is for Archive

In 1965, JCR Licklider was commissioned to write a report on Libraries Of The Future. He—and his company Bolt, Beranek and Newman—would prove to be instrumental in creating our modern interconnected hypermedia world. More on that later. For now, let’s look at this report. It has two parts:

  1. Concepts and Problems of Man’s Interaction with the Body of Recorded Knowledge
  2. Explorations in the Use of Computers in Information Storage, Organization, and Retrieval.

I love the scope of that first part, looking at the body of recorded knowledge.

It’s interesting that he talks about knowledge, not information, not data, but knowledge. How does data become information? How does information become knowledge?

The Library Of Babel

The Library Of Babel is a short story by Jorge Luis Borges, who I think of as the poet laureate of hypertext . He imagines a vast library that is filled with data, but frustratingly short on knowledge …because this library contains not only all the books ever written, but all the books that could ever possibly be written, with every possible permutation.

Here’s how it works:

The universe (which others call the Library) is composed of an indefinite, perhaps infinite number of hexagonal galleries… The arrangement of the galleries is always the same: Twenty bookshelves, five to each side, line four of the hexagon’s six sides… each bookshelf holds thirty-two books identical in format; each book contains four hundred ten pages; each page, forty lines; each line, approximately eighty black letters.

Let’s figure out how many books are in the Library of Babel. First, we need to know how much data each book holds.

  • There are eighty symbols (or letters) per line, and
  • 40 lines per page.
  • There are 410 pages in each book.

Multiplying 80 by 40 gives us 3200, the number of symbols on each page. Multiply that by 410 and we get a total number of 1,312,000 symbols in each book.

We have two other pieces of information to work with. Borges tells us:

The orthographic symbols are twenty-five in number.

That’s 22 letters, the comma, the period, and the space.

Here’s the crucial bit of information that ensures that the library has boundaries:

In the vast library there are no two identical books.

Knowing that, we can calculate the number of books in the library. It’s the number of symbols (25) raised to the power of the number of symbols in each book (1,312,000).

25 to the power of 1,312,000 expressed in base ten is 10 to the power of 1,834,097. Remember that’s just the number of books: a figure that’s over 1,834,097 digits long. That number wouldn’t fit inside one book in the library (which, if you remember only holds 1,312,000 symbols).

So the number of books in the Library of Babel is not infinite …but it is really, really, really big. To give you some idea of just how big 10 to the power of 1,834,097 is, it has been calculated that the observable universe contains approximately 10 to the power of 80 atoms. There are more books in the Library of Babel than there are atoms in the universe.

And yet, thanks to the World Wide Web, you can theoretically peruse every single one of them.

Jonathan Basile has created libraryofbabel.info—an online representation of Borges’s creation. It contains all possible text. You can browse by hexagon, then shelf, then row, then book, then page. Or you can search for a specific piece of text, because—of course—that text must be in there somewhere.

The very words I am speaking now must be somewhere in the library.

There’s also a plug-in for Chrome so you can highlight any piece of text on the web, and then find its corresponding page in the Library of Babel.

This library has every possible piece of data …but it’s sorely lacking in information (although the online version helps).

A is for All

Having all the data isn’t enough. It needs to be organised—turned into information—for us to make use of it (and hopefully further transform that information into knowledge).

There have been many attempts to organise information. When those attempts are limited to a subset of data—instead of trying to create a Library of Babel—then they can be quite successful.

Carl Linnaeus organised the natural world using a naming convention for describing species—binomial nomenclature.

Charles Messier catalogued astronomical objects.

Melvil Dewey created the Dewey Decimal System to help librarians organise their collections. But this was a proprietary system, not an open standard. So the Belgian librarian Paul Otlet devised a Universal Decimal Classification system. I think it’s fair to think of him as the father of information architecture.

The crazy old uncle of information architecture would be bishop John Wilkins, who in 1668 wrote An Essay towards a Real Character and a Philosophical Language. Centuries later, this would pique the interest of Borges who described Wilkins’s madcap idea in an essay called The Analytical Language of John Wilkins. The idea was that the world could be classified into sounds.

He divided the universe in forty categories or classes, these being further subdivided into differences, which was then subdivided into species. He assigned to each class a monosyllable of two letters; to each difference, a consonant; to each species, a vowel. For example: de, which means an element; deb, the first of the elements, fire; deba, a part of the element fire, a flame.

It didn’t scale well.

Gottfried Wilhelm Leibniz—inventor of calculus and nemesis to Newton—had a similar idea to Wilkins, but whereas Wilkins was trying to classify information using sounds, Leibniz wanted to use symbols: characteristica universalis. His leap of genius was to realise that if you could do this—represent the world with symbols—then you could perform calculations on those symbols. He described the conceptual framework for performing such operations as a Calculus ratiocinator. Centuries later, Norbert Wiener, the creator of cybernetics would say:

The general idea of a computing machine is nothing but a mechanization of Leibniz’s calculus ratiocinator.

A is for Ada

Leibniz’s calculus ratiocinator was an idea, rather than a real machine. Charles Babbage was a Victorian inventor and entrepreneur who was given seed funding by the British government to create his Difference Engine: a machine for computing logarithmic tables …a computer, if you will.

The Difference Engine

The Wozniak to Babbage’s Jobs was Ada Lovelace, the daughter of the notorious Lord Byron. Ada’s mother did everything in her power to steer her daughter away from following in her father’s footsteps of becoming a poet. Instead Ada immersed herself in the world of mathematics. It was through mathematics that Ada hit upon the same insight as Leibniz—if we can perform calculations upon symbols, and those symbols don’t have to just represent numbers, then we can perform calculations on anything …maybe even poetry.

In enabling mechanism to combine together general symbols in successions of unlimited variety and extent, a uniting link is established between the operations of matter and the abstract mental processes of the most abstract branch of mathematical science.

The Difference Engine was abandoned in favour of an even more ambitious project: The Analytical Engine. Finally, the world would get a mechanical machine capable of performing calculations on symbols that could represent concepts or thoughts—a thinking machine!

As with so many start-ups, Lovelace and Babbage never quite managed to make it to market. But at least they did genuinely have an incredible journey before going broke.

A is for Alan

The Analytical Engine, like the calculus ratiocinator, remained an idea. It would be another century before we’d get a real nuts’n’bolts computing machine, thanks to the brilliant Alan Turing.

But as well as having a hand in creating the world’s first physical computer, he also presented us with an imaginary machine. This theoretical machine was described as having:

…an infinite tape marked out into squares, on each of which a symbol could be printed.

Like Leibniz’s calculus ratiocinator, a Turing machine would operate on symbols. How many symbols are we talking about here? Remember the Library Of Babel had twenty five symbols to work with.

Two. Two symbols. One and zero. On and off. True and false. With an infinitely long piece of tape and infinite amount of time, two symbols are enough to calculate literally anything.

This reduction of the world into its smallest pieces was the brainchild of Claude Shannon. He coined the term "bit" to describe this indivisible unit of information.

By the way, Turing himself referred to his theoretical machine as an automatic machine, or a-machine. A is for A-machine.

A is for Atlantic

There’s another imaginary machine that serves as a wonderful conceptual prototype for working with hypertext and hypermedia. In 1945 Vannevar Bush published an article in the Atlantic Monthly called As We May Think. In this article, Bush describes a machine called the Memex, a contraction of Memory Index.

Memex

The Memex is built into a desk. Screens and switches on the surface of the desk allow the user to interact with huge amounts of information stored within the desk on microfilm. This would provide an "enlarged intimate supplement to one’s memory."

Because everyone’s mind is different, no two people would use the Memex in quite the same way. Bush described these individualistic approaches to linking concepts together as associative trails:

Wholly new forms of encyclopedias will appear, ready made with a mesh of associative trails running through them, ready to be dropped into the memex and there amplified.

The information stored in the Memex is the same for everyone, but the associative trails created by the user in navigating this information are unique. Bush also proposed that these associative trails could be shared. Users of the Memex could follow the breadcrumbs left by others.

This is hypertext.

A is for Augmentation

So many ideas about hypertext confined to the imagination! When do we get a working demo?

Ladies and gentlemen, on December 9th, 1968, we get the mother of all demos, courtesy of Douglas Engelbart. Six years previously, he had set out his goals for human-computer interaction in a work entitled Augmenting Human Intellect: A Conceptual Framework. To achieve this aim of augmenting humanity, he created a working demo of his oNLine System, abbreviated to NLS.

Oh, and for this demo he just happened to invent video conferencing, the graphical user interface, and the mouse. He also implemented hypertext.

With Bush’s Memex and Engelbart’s NLS, concepts could be linked together, turning them from text storage devices to hypertext storage devices. But for hypertext to reveal its true power, we need a network greater than anything seen up to this point. Towards the end of the mother of all demos, we hear the first rumblings of just such a network.

A is for Arpa

ARPA stands for the Advanced Research Projects Agency.

This government agency turned to our old friend JCR Licklider, author of Libraries Of The Future. His company—Bolt, Beranek and Newman—set about designing a communications system that used this new-fangled packet switching that Leonard Kleinrock was so excited about. With packet switching, information was broken down into discrete chunks, routed around a network independent of each other, and then re-assembled at the destination.

At the same time, this idea of packet switching was independently discovered by Paul Baran at the Rand Corporation, who was trying to find a resilient network architecture capable of surviving nuclear attack.

The packet-switching idea was put to the test with the creation of a new network called the ARPANET. The very first message sent over the ARPANET was at 10:30pm on October, 29th, 1969. It was the command: LOG IN.

The message was sent and… the system crashed after the first two characters. Fittingly, then, the real first message sent over this proto-internet was "LO".

But they fixed the bugs and they kept working on making the system better as it grew bigger. It morphed from being a single network, the ARPANET, into being a network of networks, or an Inter-network, soon shortened to simply Internet.

For this internet to work, it was essential that all the individual networks connecting to it were using the same protocols to communicate. That’s what really makes the internet the internet—regardless of what kind of hardware is being used, there’s an agreement on how to switch those packets around. That’s really all a protocol is: an agreement. They’re more like treaties than code.

Bob Kahn, Vint Cerf, Jon Postel

Bob Kahn and Vint Cerf were the statesmen crafting the internet’s protocols, and Jon Postel was the diplomat ensuring adoption went smoothly. They wanted a robust network, resilient not to nuclear attack, but to any kind of top-down control. They set out to create a protocol that would work for a network with no centre.

Together they crafted TCP/IP: the Transmission Control Protocol and Internet Protocol. TCP/IP is a deliberately dumb set of protocols. The protocols care not a whit for the contents of the packets being switched around the network. It’s a simple low-level agreement. They used to joke that you should be able to implement TCP/IP using two tin cans and a piece of string.

You can then create more complex protocols on top of this simple, low-level, dumb foundation. You can create protocols for sending and receiving email, protocols for telnet, gopher, file transfer protocols that sit atop TCP/IP. Best of all, you don’t need to ask anyone for permission. If you want to create a new protocol today, you can just go ahead and do it. All you need is rough consensus and running code. It turns out that running code isn’t the hard part. The hard part is convincing people to use your protocol. Otherwise you don’t benefit from Metcalfe’s Law:

The value of a network is proportional to the square of the number of connected users of the system.

Think about it. The first person to own a telephone had a completely useless object. As soon as one other person had a telephone, it suddenly become exponentially more useful. That was the challenge facing the creator of a new protocol at the start of the 90s: that protocol was HTTP—the HyperText Transfer Protocol.

HTTP is one part of a three-part stack: HTTP is the protocol, URLs are the identifiers, and HTML is the format. Together they form the World Wide Web project.

The project was the brainchild of a young computer scientist named Tim Berners-Lee. This wasn’t his first attempt at creating a hypertext system.

In the 1980s, he created a system called ENQUIRE. It was named after a Victorian book of manners called Enquire Within Upon Everything, which I always thought would be a great name for the web.

Enquire didn’t work out in the end, but it would influence the design of the World Wide Web project.

Another influence on the design of the system was the place where Tim Berners-Lee was working. CERN—the European Centre for Nuclear Research. It’s an amazing place. The greatest experiment in the history of our species is being conducted beneath the border between Switzerland and France. In the 16-mile wide ring of the Large Hadron Collider at CERN, human beings are recreating the conditions from the start of our universe. Protons are smashed together at velocities approaching the speed of light. It’s a truly awe-inspiring endeavour.

CERN

When I visited CERN, I expected to be blown away by the science, and I was. I also expected to be blown away by simply being at the birthplace of the web, and I was. But what I wasn’t expecting was to be blown away by how things get done at CERN. There is very little hierarchy. People from all kinds of backgrounds—from Nobel prize winning physicists to students on a Summer internship—collaborate on experiments for pure scientific research.

Trying to manage the flow of information in this collaborative but chaotic place was the challenge that Tim Berners-Lee was trying to solve. You can’t just mandate a particular operating system or piece of software—people at CERN can and will use whatever they want. In the same way that the internet is a network of networks, what CERN needed was some way of allowing all these different computers with different operating systems to share information with each other.

Tim Berners-Lee submitted a paper to his supervisor, Mike Sendall. It had the uninspiring title Information Management: A Proposal. Well, Mike Sendall must have seen some potential, because he scrawled across the top:

Vague, but exciting.

The proposal described:

…a solution based on a distributed hypertext system.

Tim Berners-Lee was very familiar with previous hypertext systems. All of these fed into his project:

  • Vannevar Bush’s Memex,
  • Douglas Engelbart’s oNline System,
  • Ted Nelson’s Xanadu, although it was still vapourware at this point,
  • Apple’s Hypercard system for the Mac,
  • and his own Enquire project.

But just creating the code wasn’t enough. He—and his colleague and collaborator, Robert Cailliau—needed to convince the scientists at CERN to use this technology. To start with, they needed a catchy name.

For a while, they floated the idea of calling it the Mesh.

Then they kick around the idea of calling it The Information Mine. But Tim Berners-Lee wasn’t keen on this one. He knew that, whatever name they chose, it would end up getting abbreviated, and he was worried it would look a bit egotistical.

And so they settled on World Wide Web. You have to admire the chutzpah of calling it World Wide Web when, at that point, it only existed on one person’s computer.

Sure enough, it did end up getting abbreviated. Except in this case, there are actually more syllables in the so-called abbreviation—WWW—than there are in the full name.

WWW

They even made a logo. Graphic designers they are not. But there is a reason for the green colour of the Ws. Robert Cailliau is a synesthesiac—he “hears” the W sound as the colour green.

And so the web was born. Good job!

Where the web was born

Learning from the lessons of TCP/IP, Tim Berners-Lee made sure to keep the individual parts of the system as simple as possible (but no simpler). The World Wide Web didn’t succeed because the technology was the best; far from it. It succeeded because the technology was just simple enough—but also powerful enough—for people to get started with straight away.

Take HTML, for example. There was no official Version 1 specification for this hypertext markup language.

Instead there was a document called simply HTML Tags, presumably written by Tim Berners-Lee. This document listed the entirety of HTML, which was a grand total of 21 elements.

Most of those elements weren’t even invented by Tim Berners-Lee. Instead, he borrowed the vocabulary already being used by scientists at CERN. They were used to writing documents in GML, which is supposed to stand for Generalised Markup Language, but was coincidentally created by three people whose last names were Goldfarb, Mosher, and Lorie: G, M, and L.

There was one element that was completely new to HTML:

A

This one single element is what enables the HT part of HTML. With this element, and its href attribute, anyone could link to anything on the web. It is brilliant in its simplicity.

Pleased with the way their project was progressing, Tim Berners-Lee and Robert Cailliau submitted a proposal to present their World Wide Web at a hypertext conference.

They were rejected. Hypertext experts thought the World Wide Web was stupid.

As Ted Nelson put it:

Today’s one-way hypertext—the World Wide Web—is far too shallow. The Xanadu project foresaw world-wide hypertext decades ago, and endeavored to create a much deeper system. The Web, however, took over with a very shallow structure.

He’s not wrong. Hypertext on the web is shallow. It is stupid. Like TCP/IP, it is not smart.

The thing is, if you’re hoping to get mass adoption, being smart is a bug. Being stupid is a feature.

Just about every other hypertext system embodied the idea of two-way linking. There was an awareness at both ends of the link. If the resource being linked to were to move or change, the link could be updated. It’s robust, but it’s complicated.

On the World Wide Web, by contrast, links only work in one direction. If the resource being linked to ends up moving or changing, well, tough luck. The result is link rot. That’s the price we pay for a very simple hypertext system.

But, now that the web has been around a couple of decades, there is a sort-of, kind-of implementation of two-way linking.

It uses the humble rel attribute.

Rel is short for relationship. The value inside the rel attribute describes the relationship of the linked resource to the current document.

<a href="…" rel="…">

Some rel values were officially canonised in HTML.

rel="prev" means that the linked resource has the relationship of being the previous document to the current document.

<a href="…" rel="prev">

rel="next" means that the linked resource has the relationship of being the next document after the current document.

<a href="…" rel="next">

rel="author" means that the linked resource has the relationship of representing the author of the current document.

<a href="…" rel="author">

You get the idea.

Incidentally, there was once a corresponding rev attribute that described the reverse relationship. In other words, the relationship of the current document to the linked resource.

<a href="…" rev="…">

So you could simultaneously say the current document has a relationship of being the previous document to the linked resource and that the linked resource has a relationship of being the next document to the current document.

<a href="…" rev="prev" rel="next">

Confused? Well, that’s why the rev attribute was eventually dropped from HTML. It was just too complicated.

There was an attempt to use the rev attribute in an early microformat called vote-links. This was proposed way back in 2004. Blogs were hot. Political blogs were very hot indeed: Bush Jr.—no relation to Vannevar—was running for re-election in the States, and John Kerry was the challenger. There was much blogging and gnashing of teeth.

The problem was the rise of PageRank, the algorithm that drove Google’s search engine. The fundamental premise of PageRank was that linking to something counts as an endorsement. But there were many bloggers linking to articles that they disagreed with very strongly.

By using a rev value of “vote-for”, authors could explicitly say that this document is a vote for the resource being linked to.

Or by using a rev value of “vote-against” they could make it clear that this document is a vote against the linked resource.

It never really took off because, as I said, the rev attribute was just too hard to grok.

And that’s okay. The whole point of microformats is that they are the very embodiment of the motto of the Internet Engineering Task Force: rough consensus and running code. There just wasn’t enough take-up of vote-links for it to thrive.

Another early idea—that actually preceded the official creation of microformats—was XFN, which stood for XHTML Friends Network: the most Web 1.0 format name ever.

XFN built on existing behaviour. Bloggers would often have a list of links in their sidebar pointing to other bloggers they had some kind of relationship with. If you assume that a URL can represent a person, then the rel attribute is perfect for encoding that relationship information.

I can link to a friend’s website and say that the person represented by the linked resource has a relationship of being a friend to the person represented by the current document: me!

<a href="…" rel="friend">

Or I can link to a colleague’s website and say that they have a relationship of being a colleague to me.

<a href="…" rel="colleague">

And because rel values—like class values—can be space-separated, I can combine rel values into one attribute. I can link to someone and say that they are both a friend and a colleague.

<a href="…" rel="friend colleague">

I still have XFN values in the sidebar of my blog, but again, it never really took off.

Except for one value, that seems at first glance to be completely pointless:

A rel value of "me": the linked resource has a relationship of being …me?

<a href="…" rel="me">

A is for Adactio

My website is adactio.com. I love my website. Even though it isn’t a physical thing, I think it might be my most prized possession.

It’s a place for me to think and a place for me to link.

But my web presence isn’t limited to adactio.com—I have profiles on lots of different sites: Twitter, Facebook, Instagram, Medium, Dribbble, Flickr, Tumblr, CodePen …the list goes on.

None of those places are as important to me as my own website, but they are representations of me.

I link out to these profiles from my own website using that rel value of "me". That’s me on Twitter. That’s me on Github. That’s me on Flickr.

<a rel="me" href="https://twitter.com/adactio">
<a rel="me" href="https://github.com/adactio">
<a rel="me" href="https://flickr.com/adactio">

Nothing unusual there. These are regular one-way hyperlinks.

What’s interesting is that many of those profiles on other websites provide a URL field where I can enter my own website. These third-party profiles then link back to my website also using a rel value of "me".

<a rel="me nofollow" href="https://adactio.com">

(They also use a rel value of "nofollow" to discourage spammers. The phrase "nofollow" makes absolutely no sense as a rel value—you can’t have a relationship of "nofollow" to anything—but it was invented by Google. We don’t get to argue with the 900 pound Google gorilla.)

Anyway, the result of having these reciprocal links, both using rel="me" means that we’ve kinda, sorta got two-way linking on the World Wide Web.

But …so what?

A is for Authentication

Some of those third-party profiles I’m linking to—Twitter, Github, Flickr—have something in common. They allow third-party authentication using OAuth.

If I can log into my Twitter, or Github, or Flickr profile using OAuth, and those profiles have two-way links with my website, then I can “borrow” that authentication flow for my own site.

That’s the idea behind IndieAuth. I enter the URL of my own website. It finds the links from there to my other profiles using rel="me". Then I can choose which one of those profiles I want to authenticate against. Once I’ve authenticated with that service, I’ve also authenticated my own website.

Just by adding a short rel value to some links, I can use my website as a log-in.

My website also has a small write API. The API is called micropub. By combining IndieAuth and micropub, I can log into somebody else’s posting interface using my website, and then use that interface to post to my own website.

These building blocks—microformats, IndieAuth, micropub—have emerged from a small but dedicated group of people.

We gather together at fun little events called Indie Web Camps.

An Indie Web Camp takes place over two days. The first day is split into discussions of technology and design. The second is all about implementing what we’ve discussed on the first day. I’m always amazed by how much gets done when you’re in the same space as like-minded people.

Still, the technologies being discussed and implemented aren’t the real focus of Indie Web Camp. The core tenet of Indie Web Camp is an idea. It’s a very simple idea, that at one point would have been uncontroversial. That idea is that you should have your own website.

But who’s got time for that? Especially when it’s so much easier to write and share and link using platforms like Twitter, Facebook, and Medium. That’s a good point. A lot of the time at an Indie Web Camp is spent reverse-engineering what those services are doing so well, and applying them to personal websites.

Also, we don’t necessarily want to stop using those services. After all, that’s where the people are (for now). Instead we want to figure out ways of making use of these services, while still keeping the canonical versions of what we create under our own control.

There’s a fantastic little Indie Web service called Brid.gy that allows you to not only post from your own site out to third party networks, but also receive replies, and likes, and retweets back at your own site, using another Indie Web building block called webmention.

In a way, webmentions allow a kind of two-way linking. I can cross-post something from adactio.com to Twitter, or Instagram, or Facebook. Then when someone replies on Twitter, or Instagram, or Facebook, I get notified with a ping back to my own site.

All of this is possible because I’ve verified the identity of those third-party profiles using nothing more than a simple rel attribute on a hyperlink.

The Indie Web uses a grab-bag of deliberately simple technologies that combine to create something so much more powerful than the sum of its parts …just like the web itself.

We often talk about places like Facebook, or Instagram, or Medium as walled gardens. Walled gardens are as old as the web itself. AOL was a well-cultivated walled garden back in the day.

To The Garden

There’s nothing wrong with walled gardens. They’re safe spaces. They take care of your enjoyment and entertainment, so you don’t have to.

But there also a bit boring. I certainly don’t relish the idea of spending my days within the boundaries of someone else’s vision.

There’s a different kind of garden. It takes its name from another short story by Borges.

The Garden of Forking Paths. It is uncontrolled. It is full of possibilities. It’s a bit scary. It takes more dedication to explore. You might get lost. But is that so bad? When was the last time you were truly lost on the World Wide Web, when you clicked through link after link—no cheating by opening new tabs, now—until you ended up somewhere, blinking and asking yourself “what I was looking for?”

I would like us all to spend more time in the garden of forking paths. I would like us all to continue to grow this garden of forking paths. Add your own website to this garden of forking paths. Use it to make more links.

On your website, you can link to this thing over here and that thing over there, and in doing so create an entirely new forking path.

Remember, the web, like the internet, has no centre. In theory I could start from any single A element, and by following all the forking paths, traverse the entire World Wide Web.

That opening hyperlink could be on your own website. One single A element can be the portal to an entire universe of knowledge.

Tuesday, June 7th, 2016

PURL: A Portable Content Store - Not Enough Neon

I need to wrap my head around the details of this approach, but it sounds like it might be something I could do here on my site (where I feel nervous about my current dependency on a database).

Friday, January 29th, 2016

Taking part in the IndieWeb

The slides from Calum’s presentation at Front-end London.

Wednesday, September 3rd, 2014

Hello, Again — Craig Mod

Craig has redesigned and pulled various bits of his writing from around the web into his own site, prompting some thoughts on the indie web.

Monday, August 4th, 2014

Open standards for contact details and calendar events | Technology at GDS

I’ve been suggesting h-event and h-card as open standards for UK government sites.

Wednesday, April 23rd, 2014

The Indieweb | Parallel Transport

or: how I learnt to stop worrying and love the blog.

This is a really nice introduction to the basics of the Indie Web …with nice illustrations too.

Sunday, March 9th, 2014

Learning about, and deploying IndieWeb tools | Dan Gillmor

Well, this is pretty nifty: Dan Gilmour is at Indie Web Camp in San Francisco and he’s already got some code up and running on his site.

Y’know, I’m not missing South by Southwest in the slightest this year …but I’m really missing Indie Web Camp.

Monday, February 3rd, 2014

rel="source"

Aral and his trusty sidekick Victor have taken up residency for a while at the Clearleft office in Middle Street while they work on their very exciting project. It’s nice having them around.

I got chatting to Aral about a markup pattern that’s become fairly prevalent since the rise of Github: linking to the source code for a website or project. You know, like when you see “fork me on Github” links.

We were talking about how it would be nice to have some machine-readable way of explicitly marking up those kind of links, whether they’re in the head of the document, or visible in the body. Sounds like a job for the rel attribute, I thought.

The rel attribute describes the relationship of the current document to the linked document. You can use it on the link element (in the head of your document) and the a element (in the body). The example that everyone is familiar with is rel=”stylesheet” when linking off to a CSS file—the linked document has the relationship of being a stylesheet for the current document.

The rel attribute could theoretically take a space-separated list of any values, just like the class attribute. In practice, there’s much more value in having everyone agree on which rel values should be used.

There used to be a page on the WHATWG site for listing rel values, but it tended to stagnate. So now the official registry for rel values is on the microformats wiki. That’s where you can see which values are recommended for use today and you can also brainstorm new ideas.

The benefit of having one centralised for this is that you can see if someone else has had the same idea as you. Then you can come to agreement on which value to use, so that everyone’s using the same vocabulary instead of just making stuff up.

It doesn’t look like there’s an existing value for the use case of linking to a document’s (or a project’s) source code so I’ve proposed rel=”source”.

Now I should document some use cases of people linking their site to its source code. It might be that wikis qualify as another use case: every “edit” button points to the source of the document in wiki markup.

If you have any thoughts on this pattern, or examples to add, please feel free to add them.