Google Sheets Link Checker


In my previous role I spent a lot of time doing site transition, simply put the basic principles of properly doing a site transition is to make sure that there are no broken pages / URLs to be found on a website (regardless of them being indexed or not). Without going into detail about all the different types of site transitions there are, I am going to show you how I used a Google Sheet and a custom Google App Script function to speed up the process and increase the quality of a website transition.

Last week I release an introduction to Google App Scripts guide and would suggest checking it out before continuing on with this guide!

Now if you are familiar with Google App Scripts you may know that it is primarily based on JavaScript ( v 1.6 ), and if you are familiar with JavaScript you will know that you can make HTTP request to websites with only a few lines of code. In this guide I will be showing you how you can make HTTP requests to a website in order to check the current response status of a given url. For those of you that have crawled websites using tools such as Screaming Frog, you will have noticed that for every url crawled, a response code(status code) is returned. Now these response codes are used to notify the requesting server if a request has been successful or not. These are called HTTP status codes, the most common are 200 for success, 301 for permanent redirect, and 404 for page not found. During a site transition the main objective is to find broken links and redirect them to non-broken URLs. The easiest way to find broken links is by testing their status codes, if a link is broken it will return a status code of 404.

Note: I have come across a few websites that have their web servers setup in such a way that instead of returning a 404 status code, the url redirects (302) to a page with the characters “404” on it while returning a HTTP status code of 200, this is considered a “soft 404” in Googles eyes. Hopefully the website you are working on does not do this, but if it does, feel free to contact me and see how I got around that issue.

Before we start I will mention that in order to make the most out of this, you are going to need to compile the largest list of URLs for a website that you can. This means going beyond just the sitemap, the reason being is that most sitemaps are dynamically created upon request. In other words if a page or url gets delete from a website it too will be deleted from the sitemap, meaning if the sitemap is your only source of truth your not going to find any broken links. Hint: Google is your friend.

Lets get into it!

Prerequisites:

  • Google Account
  • Google Sheet
  • List of URLs

Step 1 - Create your Google Sheet




Step 2 - Add the list of URLs into the sheet.


Depending on the source of your list, you may have duplicate URLs. Two ways you can clean the list is by using the quick function “Data> Remove Duplicates” Or using the built in function “unique()”.



Step 3 - Adding our custom function “retrieveStatusCode()”




Step 4 - Calling our custom function from the Google Sheet








Step 5 - Copy and paste values only (this makes the list easier to sort)




Step 6 - Applying a filter, this allows us to sort or filter by status code






That’s it! Hopefully you have now seen another way you can use Google App Scripts to automate and speed up your day!

Feel free to contact me on LinkedIn if you have any comments or suggestions!