I’ve powered through about 500 broken links on the backend of the site here, and I see a couple of common issues so far:
- Major media outlets, as I mentioned earlier, have sucked at keeping their links current. Most of them do use some kind of redirect logic on the backend, but links to big outlets like CNN, and CBS have a 50/50 chance of resolving, and all the links I’ve found to the Baltimore Sun are completely broken.
- A quirk in how I originally built the site and how it’s structured now means that 1/4 of the internal images I uploaded and linked to (everything before about 2014 or so) are technically broken, although WordPress actually does a great job of redirecting to the right file. I’ve found a way to update these quickly, which is a blessing, because…
- …when I ported the site over from hand-coded pages, I missed a whole swath of links that pointed back into the site. These now need to be individually hunted down and updated. This will represent roughly 75% of the time I spend on this project.
- At some point something happened internally with the HTML parser that changed < and > with < and &rt; in random places, which are the character entity references for those characters. Because the < and > characters make up a huge chunk of HTML coding, this can be a gigantic problem: the HTML won’t parse and you (the reader) are looking at gobbledygook. I’m going through and trying to find the pages where this happens and fix it.
On the whole, this plugin is awesome, and it’s doing an excellent job of automating the process: it suggests a date-coded Wayback Archive link as close to the original post as it can find, which is pretty slick. 1600+ links to go…