Luny
Back

The Journey of Optimizing for SEO

The thousand-obstacle course when I had to optimize Search Engine Optimizations for Google Search to index and crawl my pages (including this blog post!)

Published on
Updated on

8 min read

Overview

SEO is Search Engine Optimization, the process of configuring various things in your website so that the usual search engine can favor your website over others in a ranking, and prioritize you in search results.

Components

<meta name=“robots” />

Some pages should not be indexable, like sensitive data like user accounts, financial pages or administrative data. There are ways to politely ask the robots to ignore these pages, but beware, they might ignore your request.

You can request crawlers not to index this page by using the following meta tag: <meta name="robots" content="option" />

While not part of any specification, it is a de-facto standard method for communicating with search bots, web crawlers, and similar user agents.

Major Crawlers

Most major crawlers like Google’s Bot, Bing Search, Yahoo, and other ethical search engines always respect your choice and settings for these robots blockers. But this doesn’t prevent ANY in-house crawlers that ignore these, so any security concerns have to be considered even if you don’t expect your site to be searchable.

There are several options you can specify for the content attribute:

Crawler Access

These meta tags only go into effect if the crawler decides to crawl the webpage in the first place. If they got blocked by robots.txt as detailed below, they will not even see the meta tag to follow its instructions.

robots.txt

It is a text file that tells robots (such as search engine indexers) how to behave, by instructing them not to crawl certain paths on the website. It is placed within the root directory of a website.

Malicious Crawlers

While using this file can prevent pages from appearing in search engine results, it does not secure websites against attackers. On the contrary, it can unintentionally help them: robots.txt is publicly accessible, and by adding your sensitive page paths to it, you are showing their locations to potential attackers.

Also be aware that some robots, such as malware robots and email address harvesters, will ignore your robots.txt file.

A robots.txt file can specify some fields like so:

User-Agent: *
Disallow: /sensitive

User-Agent: Googlebot
Allow: /
Disallow: /google

The above robots file notes the following conditions:

HTTPS connection

This does not affect whether Google can index or crawl your page, but it affects how high ranked your webpage is amongst search results.

It’s a very simple change, but HTTPS will rank your webpage higher than non-HTTPS webpages.

Canonical links are created with the <link> tag.

<link rel="canonical" href="<canonical_link>" />.

Canonical links provide a way for Google and search engines to know which page is the “original” copy, and can skip indexing duplicated pages. For example, this page can also be accessed with luny.dev, but that gets redirected to www.luny.dev, which again gets redirected to the https version of the page.

So, in short, there are 3 versions of this webpage, but only the one on exactly https://www.luny.dev is the main copy, which will be the “canonical link” here. You may serve other pages for HTTP instead of HTTPS fine, just make sure to have a link tag in that page, pointing back to the HTTPS version.

This is because, Google doesn’t want to index low-effort, duplicated pages. Canonical links help tell Google that there’s only one original, high-effort content to index instead.

Redirects

301 Redirect or 302 Redirect responses are extremely strong indicators that these are not final versions of the webpage. And this will stop Google from indexing any pages with the strong redirect, before it even starts.

Redirected Pages

It’s fine to have redirecting pages in your app, it’s just that Google won’t index those, but their targets, where they redirect towards.

For example, A redirecting to B, B redirecting to C, when Google tries to index A, it will tell you that both A and B are redirects, and won’t be indexed. But C will be.

If you’re sure your page is mistakenly redirecting, make sure to fix it as Google refuses to index any redirections.

Meta Tags

Meta tags provide search engines basic information about your site to display:

<title>Title</title>

<meta name="description" content="Description" />

Ideally, description should be around 150-180 characters. The meta description tag does not affect rankings or SEO, but it gives your site a larger area on the search results page.

The title tag plays a critical role in SEO rankings based on the W3C standards. Most large web engines search for this tag.

Semantic HTML and Accessibility

The web is accessible by default, when using direct HTML5 elements. But React developers will find a way to reimplement basic features using <div>, and destroys any hopes at accessibility rankings. Believe it or not, accessibility, something that most web developers don’t even think about, affect SEO strongly.

Clear structure with semantic HTML like <header>, <nav>, and heading tags with proper hierarchy plays a major role in how search engines view your page.

Transcripts, alt texts for media like images or videos should also be provided.

Core Web Vitals

I think any web developer that cares enough about speed should already know PagespeedTest, or Google Chrome’s Lighthouse tool.

These provide a lot of information about several core factors of SEO:

Google provides a ton of resources on how to improve these factors on Google Developers.

Certain options can be done, that I have done on this webpage for example:

  1. Removing Latex support if the webpage does not have any maths in the first place.
  2. Removing other languages’ font depending on the locale.
  3. Put high priority on images that may cause CLS.
  4. Put defined width and height on images if possible.
  5. Shrink images to proper formats like png or webp at just enough resolution to serve the images. We don’t need a 4K image to display in a 32x32 square.

Images Optimization

There are a lot of ways to optimize for image delivery, usually they are the core of some websites (eCommerce, how would your buyers decide to buy if you can’t show what they’re gonna get?).

One more thing that I think would help is specifying sizes on image source sets, which greatly help Cumulative Layout Shift (CLS) mentioned above.

180k    public/content/dictionary/search-screen.jpeg
344k    public/content/dictionary/search-screen.png
131k    public/content/dictionary/search-screen.webp

For example, for the same image, a webp compression reduces the usage to 1/3 of what it would have been with png! While jpeg still provides a huge edge over png, webp is simply superior due to lossless, and still having a smaller file size.

Also the same example, I had a folder of 4 images taken as png, which totalled to about 900KB, after a full compression down to webp, it now only takes up 240KB.

You can check out the tool here: CWebP for Web Developers, or you can use ffmpeg.