Broken links suck. It's incredibly frustrating to read a great article that links to an external resource that covers a subtopic in detail only to find the link is broken. In that moment I curse whatever developer or webmaster of the external site didn't think that creating a 301 redirect was worth the effort. I end up going to the root domain to hunt for the article by topic, hopefully the link text or URL slug gives enough topic or keyword clues to find it. Sometimes the external resource is gone completely and then I'm off to Archive.org to try to find a cached copy.
You know what's worse than a broken external link for your Drupal site? A broken internal link.
A broken internal link is a slap in the face for user experience. You didn't create a 301 redirect from the old URL to a relevant new URL. You didn't update the link you have control over. Do you care about your reader's experience at all? I want to help you prevent broken internal links in your Drupal sites and show you a process to automate link updates. Don't slap your users, follow this process!
What are internal links
Internal links are simply links which are to URLs on the same domain or website. For example, this is an internal link to our Drupal CMS guides. The link is to a page on the Daymuse.com domain from a page on the Daymuse.com domain. We have full control over both the link href (an anchor attribute which creates a hyperlink, the link destination URL) and linking page. The opposite, external links, are simply links to a different domain. You don't have any control over the link destination.
Why internal links are important in Drupal
Inbound links (external links on other sites to your site) are important to build domain authority for SEO, improving search rankings. Similarly, internal links help create content relationships within your site and give context to content through their link text.
Sidenote: Search engines account for internal link anchor text. Ideally, you don't want anchor text to simply say "click here", "more", or some other non-contextual, non-descriptive anchor text.
How to create internal links in Drupal body text
The primary way you'll create Drupal internal links is through body text on pages. More than likely, your WYSIWYG offers a link button that'll pop-up a dialog to input a link URL. This gives you a few methods to create internal links.
Absolute internal links
If you create a link with a fully qualified URL (the complete http(s)://subdomain.domain.extension/page-name), that's an absolute link. No part of the link is "variable", its destination will forever be the exact URL you've input. If any part of the destination URL were to change in the future, your absolute links will become broken. You'll need to manually update them.
Relative internal links
You can create links that are relative to the linking page. There's a few methods of creating these relative internal links.
If you were to simply add the href of "page-name-2" to a link from a page at this URL:
http(s)://subdomain.domain.extension/topic/page-name
The href would be relative to the current location and so the link will actually go to:
http(s)://subdomain.domain.extension/topic/page-name-2
The link destination will retain the depth, and target a sibling page ("page-name-2" within /topic/).
You could also link to a page on the current domain from the top level by using a leading forward slash. If your link href was set to "/page-name-2" in the above example, with a leading forward slash, your destination would become:
http(s)://subdomain.domain.extension/page-name-2
The forward slash causes the hierarchy to collapse, leading from the top level domain: "/topic" is dropped.
There's some additional advanced methods of creating internal links in HTML documents, but this should give you an idea of how they work. In essence, they create variables so that your links will work when certain conditions change. If your domain name were to change but your content hierarchy remained the same, your relative links would still have destinations that match your hierarchy and become relative to the new domain name. This gives you more flexibility than absolute internal links, but it's still not the best method.
Canonical internal links
Drupal's internal URL scheme creates canonical locations for content. These are the "non-pretty" URLs. Drupal's node system is reflected in these internal URLs:
http://www.domain.com/node/<nid>
In this scheme, the <nid> represents the internal node ID (a unique identifying integer) for a particular piece of content. This works similarly for Drupal's taxonomy and term system:
http://www.domain.com/taxonomy/term/<tid>
The <tid> is the term ID, identifying a taxonomy term. Using our example above, this is the canonical URL to our Drupal CMS guides:
http://www.daymuse.com/taxonomy/term/23
The "Drupal CMS guides" term ID is 23. If you inspect or hover the actual link above, you'll see that the URL is pretty, human readable:
http://www.daymuse.com/guide/drupal
We've setup Drupal to have an alias for "/guide/drupal" to display our "Drupal CMS guides" term. Drupal creates these pretty aliases by default ("clean URLs").
However, this is the literal HTML code that Drupal stores for that body text internal link:
<a href="/taxonomy/term/23">Drupal CMS guides</a>
This is a relative internal link to a canonical URL. Our WYSIWYG is configured to translate these internal canonical URLs to their pretty aliases on the fly. This takes care of multiple issues:
- If we were to decide to change our domain or store this blog post on a subdomain, the relative link would automatically reflect the domain change
- If we were to alter the alias from "/guide/drupal" for this particular term to "/guides/cms/drupal", the canonical URL would point to the appropriate pretty URL
This works the same way for nodes: if we link to their canonical URL and their alias is updated, Drupal will automatically translate the canonical URL to the new pretty URL on the fly. Tracking down the canonical URL manually for each internal link can be a pain, though. We're getting to automating the internal link creation process, but first let's make sure we understand why it's important.
Why creating Drupal canonical internal links is important
As your Drupal website develops, you'll undoubtedly create more content: more nodes, more terms. You'll also come up with new URL strategies and ways to classify your content through the pieces of your URLs. Perhaps you'll decide to add a portion of the date within the URL, maybe a category the content resides in, or you just want to manually alter a piece of content's URL to make it shorter or better reflect the contents. This will happen over time. When it does, you don't want to either have to go back and update all your internal links to reflect the change, or even worse, cause broken links and leave them broken. Broken internal links will degrade your search engine performance two ways:
- You'll lose the internal link anchor text context and relationship between pages
- Search engines will devalue your content due to having broken links
Don't cause broken internal links. You have full control over fixing them and implementing a strategy to avoid them in Drupal. Let's automate the process.
Drupal Internal Link modules
What we want to do to automate internal link updates is simply make sure that all of our internal link destinations (hrefs) are to canonical URLs wherever possible. You can do this by tracking down the internal path and ID of every term or node you want to link to. That requires a lot of extra effort. There's a better way. Thankfully, Drupal's community of contributed modules comes to the rescue.
Drupal WYSIWYG Module and editor plugin (TinyMCE, etc)
If you're using the WYSIWYG module plus one of the editor plugins it supports, your best bet is the LinkIt module. LinkIt provides a button in the WYSIWYG and a dialog to search for content on your site to link to. Once you click the LinkIt button in your WYSIWYG, it's just a matter of following the prompts to create internal links:
LinkIt will automatically create your link with the relative canonical URL. You don't have to hunt down the content ID or fuss with the HTML of the anchor.
Drupal CKEditor Module and CKEditor Link Module
All of our Drupal projects now utilize the CKEditor module for a WYSIWYG. This is the direction Drupal 8 has gone and so we try to make sure the eventual progression will be seamless for our Drupal services clients.
The CKEditor module has an extension or submodule that performs similar work to LinkIt, the CKEditor Link module. This allows you to use the lovely CKEditor WYSIWYG to create internal paths quickly. We love it.
After following the module instructions to setup and configure CKEditor Link to work with CKEditor, you'll have a simple process to creating internal, canonical links. This is our internal link workflow using CKEditor and CKEditor Link modules:
- Select the anchor text you wish to use in the WYSIWYG
- Click the CKEditor Link button
- Make sure the "Link type" in the dialog is "Internal path"
- Start typing the title of our content in the autocomplete text input, select the right suggestion
This process creates a link to the internal canonical Drupal path to the content.
If you ever change your domain or alter the destination page's alias pattern, this internal link will still work. Combining CKEditor and CKEditor Link modules is our favorite way to handle internal path and link creation within Drupal.
Checking for broken internal links
Now that you have a process to create proper internal links (if you haven't, make changes now), you may need to check for any broken internal links. Drupal offers an excellent module for broken link checking, but first let's look at Google's Webmaster Tools. The free tool offers excellent insight for broken links.
Using Google Webmaster Tools to check for broken internal links
You do have a Google Webmaster Tools account, right? If you don't, sign-up. If you're a manager or webmaster, ask your developer to follow these steps to check for broken links. It's quick, easy, and free. This is a great way to fix search issues, raise your ranking, and properly distribute your internal traffic. Don't slap your users with broken internal links.
After logging into the Google Webmaster Tools (GWT) and selecting your domain, find your broken internal links this way:
- Expand the "Crawl" menu
- Select "Crawl Errors"
- Select the "Not found" tab
This will provide you with a list of URLs with problems. You're interested in those that aren't found (404 errors). The URLs listed tell you which URLs Google has stumbled across and received a 404 error. These are great candidates for creating 301 redirects to the right, or at least relevant, content on your site. What you really want to check for though is where these broken URLs are linked from. Are they your own internal pages? GWT to the rescue again! Click any of the URLs listed:
The pop-up dialog will indicate the broken URL and where it is linked from. Go to those pages and fix your broken internal links.
Automate broken link checking in Drupal with Link Checker module
If you've got a large Drupal site, manually checking GWT for broken links may not be the most effective way to find them. Drupal's Link Checker module will automate this process for you. Setup the module and it will scan your new and existing content for links, follow them, and identify any problems with them through their HTTP status code. Fix any 403s or 404s! The Drupal Link Checker module works with both internal and external links.
Why are internal broken links bad for your Drupal site?
Search engine spiders or crawlers are very busy workers. There's an entire web worth of pages to crawl and re-crawl. When they run into a broken link, they may stop where they are and move onto the next page. Broken links are a negative signal to search crawlers. Don't give them a reason to devalue your page.
I'm sure you've run into a broken link or two in your life while reading a page on some website. I bet it may have tempted you to stop where you are on the page and try to find a more authoritative, updated page about what you were reading. If the page owner hasn't checked their content over time to verify the links work, what's to say the content is still accurate? Not only do broken links discourage search engines, but they discourage users. It's especially embarrassing where the links are internal because the webmaster has control over both the destination page and the linking page.
Broken internal links can easily be avoided. Create a workflow to utilize canonical, relative internal links. Automate your internal linking process as much as possible by following the steps above. Identify any existing broken links with GWT or Drupal's Link Checker module and setup a process to routinely check for broken internal links. Make sure to create 301 redirects if you update existing content paths to prevent external sites from linking to now nonexistent paths. Don't slap your users with broken links, they're quick to fix!
This post was inspired by reading Mike Gifford's spring cleaning tips for Drupal sites, be sure to show your Drupal site some love with routine maintenance! Did you like this article or know someone that needs some link workflow help? Share it! If you're still having trouble getting your links in order, you can hire our Drupal team!