Luna | The Journey of Optimizing for SEO

Overview

SEO is Search Engine Optimization, the process of configuring various things in your website so that the usual search engine can favor your website over others in a ranking, and prioritize you in search results.

Components

<meta name=“robots” />

Some pages should not be indexable, like sensitive data like user accounts, financial pages or administrative data. There are ways to politely ask the robots to ignore these pages, but beware, they might ignore your request.

You can request crawlers not to index this page by using the following meta tag: <meta name="robots" content="option" />

While not part of any specification, it is a de-facto standard method for communicating with search bots, web crawlers, and similar user agents.

There are several options you can specify for the content attribute:

index: Allows the robot to index the page. This is the default behavior. Used by all major crawlers.
noindex: Requests the robot not to index the page. Used by all major crawlers.
follow: Allows the robot to follow links on the page. This is the default behavior. Used by all major crawlers.
nofollow: Requests the robot not to follow the links on the page. Used by all major crawlers.
all: Equivalent to index, follow. Used by: Google.
none: Equivalent to noindex, nofollow. Used by: Google.
noarchive: Requests that the search engine not cache the page content. Used by: Google, Yahoo, Bing.
nosnippet: Prevents displaying any description of the page in search engine results. Used by: Google, Bing.
noimageindex: Requests that this page not appear as the referring page of an indexed image. Used by: Google.
nocache: Synonym of noarchive. Used by: Bing.

robots.txt

It is a text file that tells robots (such as search engine indexers) how to behave, by instructing them not to crawl certain paths on the website. It is placed within the root directory of a website.

A robots.txt file can specify some fields like so:

User-Agent: *
Disallow: /sensitive

User-Agent: Googlebot
Allow: /
Disallow: /google

The above robots file notes the following conditions:

Any crawler is allowed to index and crawl all pages, as long as it doesn’t start with /sensitive.
Google’s Crawler (Googlebot) has a more detailed section, so it will use that section:
- Google can freely index any routes, the entire page, based on /.
- Google can’t index /google or any routes that start with it.
- Google can index /sensitive, since the detailed directive does not block it here.

HTTPS connection

This does not affect whether Google can index or crawl your page, but it affects how high ranked your webpage is amongst search results.

It’s a very simple change, but HTTPS will rank your webpage higher than non-HTTPS webpages.

Canonical Links

Canonical links are created with the <link> tag.

<link rel="canonical" href="<canonical_link>" />.

Canonical links provide a way for Google and search engines to know which page is the “original” copy, and can skip indexing duplicated pages. For example, this page can also be accessed with luny.dev, but that gets redirected to www.luny.dev, which again gets redirected to the https version of the page.

So, in short, there are 3 versions of this webpage, but only the one on exactly https://www.luny.dev is the main copy, which will be the “canonical link” here. You may serve other pages for HTTP instead of HTTPS fine, just make sure to have a link tag in that page, pointing back to the HTTPS version.

This is because, Google doesn’t want to index low-effort, duplicated pages. Canonical links help tell Google that there’s only one original, high-effort content to index instead.

Redirects

301 Redirect or 302 Redirect responses are extremely strong indicators that these are not final versions of the webpage. And this will stop Google from indexing any pages with the strong redirect, before it even starts.

If you’re sure your page is mistakenly redirecting, make sure to fix it as Google refuses to index any redirections.

Meta Tags

Meta tags provide search engines basic information about your site to display:

<title>Title</title>

<meta name="description" content="Description" />

Ideally, description should be around 150-180 characters. The meta description tag does not affect rankings or SEO, but it gives your site a larger area on the search results page.

The title tag plays a critical role in SEO rankings based on the W3C standards. Most large web engines search for this tag.

Semantic HTML and Accessibility

The web is accessible by default, when using direct HTML5 elements. But React developers will find a way to reimplement basic features using <div>, and destroys any hopes at accessibility rankings. Believe it or not, accessibility, something that most web developers don’t even think about, affect SEO strongly.

Clear structure with semantic HTML like <header>, <nav>, and heading tags with proper hierarchy plays a major role in how search engines view your page.

Transcripts, alt texts for media like images or videos should also be provided.

Core Web Vitals

I think any web developer that cares enough about speed should already know PagespeedTest, or Google Chrome’s Lighthouse tool.

These provide a lot of information about several core factors of SEO:

Largest Contentful Paint (LCP): Measures loading performance. To provide a good user experience, strive to have LCP occur within the first 2.5 seconds of the page starting to load.
Interaction To Next Paint (INP): Measures responsiveness. To provide a good user experience, strive to have an INP of less than 200 milliseconds.
Cumulative Layout Shift (CLS): Measures visual stability. To provide a good user experience, strive to have a CLS score of less than 0.1.

Google provides a ton of resources on how to improve these factors on Google Developers.

Certain options can be done, that I have done on this webpage for example:

Removing Latex support if the webpage does not have any maths in the first place.
Removing other languages’ font depending on the locale.
Put high priority on images that may cause CLS.
Put defined width and height on images if possible.
Shrink images to proper formats like png or webp at just enough resolution to serve the images. We don’t need a 4K image to display in a 32x32 square.

Images Optimization

There are a lot of ways to optimize for image delivery, usually they are the core of some websites (eCommerce, how would your buyers decide to buy if you can’t show what they’re gonna get?).

Use webp instead of png for images on the web. Webp provides a much stronger compression, and better supported on the web. This is a file format created by Google, and ironically some of Google services don’t support webp. If you can’t use WebP due to some reasons, jpeg is the second best with lossy compression.
Resize images to make sure you only use the users’ bandwidth for exactly what is neeeded. You don’t need a 4K landscape image for a 300x100 box!

One more thing that I think would help is specifying sizes on image source sets, which greatly help Cumulative Layout Shift (CLS) mentioned above.

180k    public/content/dictionary/search-screen.jpeg
344k    public/content/dictionary/search-screen.png
131k    public/content/dictionary/search-screen.webp

For example, for the same image, a webp compression reduces the usage to 1/3 of what it would have been with png! While jpeg still provides a huge edge over png, webp is simply superior due to lossless, and still having a smaller file size.

Also the same example, I had a folder of 4 images taken as png, which totalled to about 900KB, after a full compression down to webp, it now only takes up 240KB.

You can check out the tool here: CWebP for Web Developers, or you can use ffmpeg.