The S - XL of Content Inventories

When getting to grips with a new client or project, an important first step is understanding exactly what they currently have published online, usually on their website. Quantifying and analysing an existing digital footprint via a content inventory will often reveal a lot about the brand’s online history and culture.

These insights come courtesy of examining and analysing ‘content components’ such as existing assets, content structure, formats, and language. You will also discover a lot about ‘people components’, such as the governance and workflow in place, and content ownership within the organisation.

This powerful information-gathering process is generally known as the content inventory. The inventory is used for a variety of reasons; it could be as part of a regular housekeeping programme, to kick off the discovery stage of a new project, or even to discover information about a competitor.

Content inventory

What Does a Basic Content Inventory Look Like?

Imagine a very detailed spreadsheet that lists all of the assets which exist under a specific domain name; this is your essential inventory template. It should contain all URLs, including PDFs, XML, Word, XLSX, Videos, Audio, and any other content formats found on the site. The screenshot below shows a basic sample content inventory.

Content inventory spreadsheet

This basic inventory helps us understand the volume of each content type, and begin to assess the overall quality and consistency of the content.

Including Quality and Performance Benchmarks

To bring your inventory to life you can include additional analytics and traffic data to see how content is performing. Data from Google Search Console will also give you an initial idea of performance from a search engine perspective, through metrics such as impressions, clicks and CTR.

To enrich our insights even more, we also include quality benchmarks such as readability and accessibility. As you’ll have started to realise, it then starts to form the backbone of a content audit as we combine information which allows us to assess content on many different, valuable levels.

Size Matters

Depending on the approach you use and the size of website you are dealing with, a content inventory can be a straightforward job or a bit of a logistical challenge. The inventories we’ve completed in the past for client websites range from less than 50 to more than 50,000 pages. What they have in common is the level of detail we want to gather; the principles of the inventory remain, regardless of the scale. What marks out the larger inventories is the amount of planning which needs to go into the inventory beforehand. For example, the site may need to be chunked out into manageable areas and sections.

A day spent planning and testing your inventory can definitely be time well spent. That’s because very large websites can be difficult to scan in one single session. The demands these large scans place on a machine’s memory can be quite intense, and on the human side of things, the sheer volume can often be too overwhelming to asses as a job lot. We get around this issue by:

  • Running inventories on virtual machines packed with loads of RAM
  • Breaking the website into distinct chunks to scan separately

A website delivering 50,000 lines of spreadsheet data is simply too large and clunky to analyse. Ultimately we want to break the data out by section and take a granular and flexible approach - this is one of fundamentals of good analysis. After all, you may want to eventually deliver portions of this inventory to individual content owners.

Tools and Methodologies

A basic content inventory for a small website in the past would most likely have been done by hand, gathering information by clicking around the site and noting the pages and their content makeup. It’s easy to see how problematic this approach can be. Websites often contain orphan pages which don’t feature in the main navigation but are still accessible via search engines (“dangling” pages). A manual methodology would be likely to miss these.

Which tools to use?

Bearing in mind the above potential issue, we use crawling tools to do the dirty work for us as they can put in some real elbow grease in a fraction of the time it would take us manually. Tools such as Screaming Frog are powerful and offer the chance to pull a wealth of data from a website. However, it’s important to know that while crawl tools are usually excellent at delivering a detailed inventory, they aren’t perfect. For example, they don’t always catch those orphan pages, in particular if there are no links to them via the navigation. Because of this, we always cross-reference the crawl results with robust Google Analytics data to identify any gaps in the list of URLs. In fact, you can even sync Screaming Frog with Google Analytics to give you a richer initial inventory to work off.

Why not just use Google Analytics?

GA will only give you URLs which have generated traffic i.e. have been visited, so if a group of pages have not received any traffic in the time frame you are looking at, they will be missed. Your data will therefore be incomplete.

Another option is to get a download of all URLs and files from the CMS you are using. While this can provide a comprehensive output, it’s not always possible due to access issues or timeframes. Plus, it’s unlikely you will be able to query Google Analytics data at the same time, as we can with Screaming Frog.

Now I Have It, What Do I Do?

An inventory can let you focus on a wide variety of issues. They are particularly good for surfacing technical issues which may affect your SEO quality and the overall user experience. This could be duplicate URLs, 404s, 302s, all of which should be handled appropriately to avoid unnecessary penalties within the organic search results.

As previously mentioned, the inventory also forms the backbone of the content audits we conduct.

Finally, a content inventory can help provide organisations with a realistic picture of any wider governance issues, especially when combined with a content audit. In fact, it can be a much needed wake-up call regarding the sheer volume and age of content being offered online, and the lack of a clear system for publishing and retiring content. Regular auditing will help keep on top of recurring publishing and technical issues, and ensure your content is in tip top condition at all times.

Interested in understanding more about the content you have online? Talk to our team about content inventories, audits, and governance today.


Sinead Clandillon

Head of Content

The author

Sinead is Webfactory’s content strategist and she also manages the online visibility team, comprising search, analytics and social.

Share if you like this