Salvaging linkrot with the Wayback Machine
While making some updates to the site, I did a 404 scan of my link blog and the results were��� less than awesome. So I decided to work some Eleventy magic to recover from them.
# Step 1: Log the 404s to a fileI make ample use of Eleventy���s global data files, but 404s didn���t feel like something I needed to have as part of the data cascade. Instead, I���m logging them to a YAML file in my ./_cache folder. For simplicity, they get logged like this:
https://path.to/original/page/that-is...I chose YAML as it���s about as bare-bones as you can get when it comes to file formats and is pretty easy to work with in the context of Eleventy.
# Step 2: Add an Eleventy data file to my links folderIf you���re not familiar, Eleventy allows you to create directory-level data files that can be used to augment file-level data. I was originally using it to define the layout and permalink front matter variables for all the links using the JSON option, but as a JavaScript file, directory-level data becomes even more powerful.
Setting up your data file is relatively straightforward using module.exports:
module.exports ={layout:"layouts/link.html",
permalink:"/notebook/{{ page.filePathStem }}/",
eleventyComputed:{
custom_property:(data)=>{
return some_value_based_on_data;
}
}
};
Here I���m defining two static values (layout and permalink) and a computed value (the hypothetical custom_property).
# Step 3: Consult the 404 logAs I mentioned, the 404 logging happens separately and results in updates to _cache/404.yml. To make use of all this in the Eleventy data file, I need to set up a few things at the top of the file:
const fs =require('fs');const yaml =require('js-yaml');
const cached404s = yaml.load(fs.readFileSync('_cache/404s.yml'));
Here I���m bringing in Node���s File System and JS-YAML. Then I am loading the YAML file into memory as cached404s, leveraging those utilities.
Next up is defining a helper function to search cached404s for a match:
functionis404ing(url){return( url in cached404s );
}
This function takes the URL as an argument and returns true or false. Making use of this in the eleventyComputed section is straightforward:
module.exports ={layout:"layouts/link.html",
permalink:"/notebook/{{ page.filePathStem }}/",
eleventyComputed:{
is_404:(data)=>{
returnis404ing(data.ref_url);
}
}
};
In my case, ref_url is the front matter field storing the URL I���m linking to from my link blog, so I return the value of passing that to is404ing() as is_404.
# Step 4: Lean on the Wayback MachineThe next thing I want to do is generate a link that has a good chance of working for my readers. Thankfully the Wayback Machine has a predictable URL structure for entries and it���s pretty good about handgun redirects to the most temporally-proximate snapshot when you give it a date to work from. Knowing that, I set up another helper function:
functionarchived(data){let archive_url ='https://web.archive.org/web/{{ DATE }}/{{ URL }}';
let month = data.date.getUTCMonth() 1;
month = month <10?"0" month : month;
let day = data.date.getDay();
day = day <10?"0" day : day;
archive_url = archive_url
.replace('{{ DATE }}',`${data.date.getUTCFullYear()}${month}${day}`)
.replace('{{ URL }}', data.ref_url );
return archive_url;
}
Note: I know this isn���t the most elegant/efficient code, I wanted to show step-by-step what���s happening here.
This function takes the data object as an argument and composes a URL that points to a snapshot of the given page (data.ref_url) at the time I saved the link (data.date). The data.date value is already a JavaScript date, so it���s pretty easy to turn it into the format the Wayback Machine expects (YYYYMMDD). In the end, this method returns a URL that looks something like this:
https://web.archive.org/web/20150102/http://andregarzia.com/posts/en/whatsappdoesntunderstandtheweb/
With that helper in place, I can make use of it within eleventyComputed:
module.exports ={layout:"layouts/link.html",
permalink:"/notebook/{{ page.filePathStem }}/",
eleventyComputed:{
is_404:(data)=>{
returnis404ing(data.ref_url);
},
archived:(data)=>{
returnis404ing(data.ref_url)?archived(data):false;
}
}
};
Now every link in my link blog will have an is_404 value that is true or false and an archived value that is either a valid Wayback Machine URL (if the page is 404-ing) or false.
# Step 5: Using these in the my templateI use Nunjucks for most of my site���s templating, but you can make use of these computed properties in any supporting templating language. Knowing if a linked URL is 404-ing allows me to
display the title without a link,display the source without a link, andprovide additional copy about the link���s 404 status and provide the Wayback Machine link instead.I am only going to share code with you for that final bit as it should give you enough of a sense of how you can use these properties in the other contexts too.
{%if is_404 %}<p>This link is 404-ing{%if archived %}, but
<arel="bookmark"href="://{{ archived }}">you can view an
archived version on the Wayback Machinea>{%endif%}.
p>
{%endif%}
Here you can see I am injecting a footer into the markup when the entry is 404-ing. Within that footer, I note the link���s status. Then I inject some additional text to point to the Wayback Machine���s archive of the page. It���s worth noting that I am being overly cautious here and only injecting the link if post.data.archived is truthy. This will ensure that the link won���t be shown if something fails in my code or I change how I am implementing the archived property.
# Crossing my fingersRelying on an unverified URL, even one at the Wayback Machine, is risky, but so far this approach seems to be working. If you���ve got a link blog suffering from link rot, you might consider setting up something similar. Hopefully this will help jumpstart that project for you.