Fast & flexible content snippets in Gatsby

Published 2019-05-30.

When I started this blog I chose to use Gatsby because it satisfied my desires for creating a fast, statically-hosted website. I found Gatsby just in time, I had just started writing my own from-scratch static generator. With Gatsby I could focus on the content instead of the platform.

Or so I thought. But (enticed by the prospect of a free t-shirt) I dipped my toes in to the vibrant open source community surrounding Gatsby and instantly found myself sucked in. So even though my blog has two posts and barely any traffic, I have devoted hours to tweaking and improving it. I want to share some of my work so that others (perhaps with more popular websites) might benefit.

The problem

Many sites display an excerpt of content on listing pages in order to give visitors a preview of the article. This snippet is your opportunity to entice people to click on the article and read it. The easiest way to generate a snippet is to just take the first x characters of the post.

This is well-supported via the excerpt GraphQL field provided by gatsby-transformer-remark and it’s what gatsby-starter-blog does (unless you specify a description property in your frontmatter, which is a bit awkward and doesn’t support Markdown).

Many articles start with a few introductory paragraphs to provide context or make the article seem more relatable. But does this introduction best represent the meat of the article? Wouldn’t it be nice to allow authors to choose which paragraph(s) got used to generate the preview snippet? I will first discuss the obvious way to solve this problem, then I will show you a better way.

Another important snippet is your page’s meta description. Search engines will (under the right circumstances) show this within search results, so again this snippet represents a key opportunity to attract readers.

A naïve solution

The original solution I came up with was to delineate my excerpt paragraphs using gatsby-remark-custom-blocks. Then I developed a helper function that used Cheerio to search the output HTML and search for my custom block. On my index page (and in the SEO description for each article page) I fed the article body HTML to this helper method to extract the snippet.

function excerpt(html, stripHtml) {
    let $ = cheerio.load(html);
    let $excerpt = $('.custom-block.excerpt .custom-block-body');

    if($excerpt.length < 1) {
        return null;
    }

    // Need to remove links from the excerpt
    $('a').each((index, element) => {
        let $element = $(element);
        let span = $('<span></span>').html($element.html());
        $element.replaceWith(span);
    });

    let result = $excerpt.html();
    if(stripHtml) {
        result = $excerpt.text();
    }

    return result;
}

At the time this seemed like a perfectly adequate solution, and if Gatsby were just a static-site generator then it would be. But as I learned more about Gatsby I realised this approach had a significant flaw.

The flaw

Gatsby uses React SSR to generate static HTML files for each page. These static pages allow the website to load incredibly quickly and be useable even by people with JavaScript disabled. But more importantly, they are absolutely crucial for SEO. Some search engines might claim to execute JavaScript, but if you read Does Google Execute JavaScript? and Is Bing Really Rendering & Indexing JavaScript? you will see that you probably don’t want to rely on this behaviour.

But Gatsby websites aren’t just static sites. Assuming JavaScript is enabled, once the page is loaded Gatsby will asynchronously load the site’s React app and replace the static content with it. This enables subsequent page loads to simply load the content instead of reloading the entire structure of the page. It also allows Gatsby to preload pages when the user hovers over the link, drastically improving perceived performance.

And this is where the flaw of the naïve approach becomes apparent. For subsequent page loads the client downloads the entire result of the GraphQL query needed to generate the page and then runs the React component locally to generate the displayed HTML. Because our approach has the React component process the HTML of each article to find the snippet, this means that to load a listing page the client needs to load the full body of every article on the page.

With my blog’s modest two articles (at the time of writing) this results in the path---index-***.json file being ~170KiB.

The solution

The solution to this is clear: we need to move the excerpt generation code out of the page component and into GraphQL. That way when the client downloads the GraphQL query results it will only receive the excerpts and not the entire content of each article.

This means creating a plugin. The simple option would probably be to make a sub-plugin for gatsby-transformer-remark. But this didn’t sit right with me. Nothing about “query some HTML with a CSS selector” suggests a dependency on Remark. It might be that most Gatsby users use Remark, but surely not all of them. If I’m writing a plugin I’d like for anyone to be able to use it.

Making a source-/transformer-agnostic plugin turned out to be very challenging, and I’m not entirely satisfied with the way I ended up achieving it. But I did achieve it, and you can view the source code if you want to see exactly how.

Results

I am proud to present gatsby-plugin-excerpts, which I believe is an elegant way to generate snippets and excerpts from your content. If you’re interested in learning how to use this on your site, please see the documentation, especially the example.

While I’ve only talked about using this technique in conjunction with gatsby-remark-custom-blocks (because this is the technique that I personally use), gatsby-plugin-excerpts is quite flexible and can be used to generate excerpts using any CSS selector.

Thanks to this approach my generated path---index-***.json file has shrunk from ~170KiB to ~4KiB, and that’s with only two blog posts. Larger websites could realise significant absolute savings.

Joshua Walsh's Blog