Latent Semantic Indexing: Why It Won’t Benefit Your SEO

Lachlan Perry

Lachlan Perry

From a digital marketing and web hosting background, Lachlan is keen to provide all of his insight and knowledge learnt over the years working in the industry to those who want to see their business succeed. On weekends, you can find him enjoying good music and even better food.

If you love SEO or are wanting to learn more about how to create high-quality, user-focused content that ranks, you’ve more than likely come across the term Latent Semantic Indexing (LSI). 

It is a term floated by influencers and SEO “gurus” who suggest that the application of “LSI keywords” will greatly influence the results of your organic SEO campaigns and provide higher rankings. 

Latent Semantic Indexing has a long-standing history of being misunderstood and in this article, we’ll try to dispel all the myths and help fight the disinformation surrounding LSI and why it’s nothing more than a fancy term used to sell SEO courses and eBooks on “how to write better content for better rankings”.

Before we begin to understand whether or not LSI has any effect on your SEO, it’s important to know what the science around it is and how the term came to fruition. 

What Is Latent Semantic Indexing?

We’re glad you asked.

Latent semantic indexing, commonly referred to as latent semantic analysis, is a mathematical practice that aims to find the hidden relationship between words and concepts in order to enhance the accuracy of said information, through a technique called singular value decomposition (SVD). 

By using SVD, search engines are able to scan through unstructured data and identify relationships or common traits amongst the concepts and context contained within said data.

Before the concept of SVD, computers and search engines weren’t able to determine the relationship between certain sets of words and may have treated them individually without knowing if they were contextually related.

Its history began in the late 80s, when Susan Dumais, a famous Microsoft research and search engineer, along with fellow inventors such as George Furnas and Scott Deerwester, created the famous term and sought out to simplify the information retrieval process that was otherwise “tedious” and still required “users or providers of information to specify explicit relationships and links between data objects or text objects”.

The original patent was released in 1992, but Google has been making use of natural-language processing for almost a decade now, reflected in many of its algorithm updates.

What LSI Sought Out To Understand & Solve

If you’re not yet following, I’ll simplify it even further for you. 

Some words may have multiple, different meanings (polysemic) and there are words that mean the same thing as each other (synonyms). 

Before the concept of LSI, there was a fundamental flaw with technology-based systems being able to retrieve a user’s request given that some words have different meanings, and that the individual words themselves would not always be able to reliably provide the best information to the user.

It is all about understanding the relationship and concept between words – think of the term “hotdog”.

The individual words “hot” and “dog” are both vastly different, but when put together, create an entirely new concept altogether. 

The issue that arises when talking about LSI and if Google may be using it, is that LSI was patented years before the introduction of the web and was intended for small data sets – nothing on the enormous scale of the web we use today.

We know that Google has focused on using natural language and machine learning processing technologies for the good half of the last decade, with the likes of Hummingbird, BERT and RankBrain which were designed to understand large data sets such as the index of the world wide web.

What Are LSI Keywords?

Before we begin this section, it’s important to make a very clear distinction between these two concepts.

Latent Semantic Indexing (LSI) is a real concept coined by Microsoft researchers – LSI keywords, are not. 

There is no evidence to suggest that Google is using LSI as part of their ranking algorithm or to try and identify pages of “higher value” – despite what this featured snippet says. 

A featured snippet from Google about "when Google started using LSI"

The year is now 2022 and by all means, LSI is old, patented technology and may bear no impact over how Google retrieves synonyms, polysemous and semantically related words. 

Google does look for semantically related words, given how accurate their search engine is – but suggesting it is based on a 30-year-old patent without evidence to suggest so is a little hard to believe.

LSI keywords sound like you’re applying the scientific approach of a patent that sought to innovate the information retrieval, understanding and evaluation but in reality – you’re simply just using synonyms and related phrases to your butter up your keywords, which isn’t necessarily a scientific application in nature and doesn’t actually make use of latent semantic indexing.

Is there any harm in adding synonyms and related keywords to your content? Absolutely not. 

We use synonyms and related words to help enrich content, make it exciting and engaging to our readers because we tend to write long-form, evergreen content but this is not applying LSI and this is largely where the misconception lies.

The general consensus of false information that is spread suggesting that the use of this technique will improve your search rankings, is no more than a farce pushed by SEO gurus, LSI “activists” and content marketers to sell you their expensive courses. 

In fact, if you conduct a Google search for “latent semantic indexing”, you’ll see that almost all of the top results include a section on LSI keywords and how to use them for your website copy to achieve better SEO results. 

The reality is that LSI keywords aren’t a real thing. 

Don’t just take it from me, take it from Google’s John Mueller too.

Latent Semantic Indexing & Its Affect On Your SEO 

To make a long story short, there is no evidence that Google is using latent semantic indexing to better understand the context of a page, or is using LSI to determine where you should rank regarding a keyword because you might be using synonyms.

There has long been debate about the importance of LSI and its effect on how well your SEO campaigns perform, but Google themselves have never released a patent nor sub-patent on the use of LSI in their ranking algorithm. 

There are LSI patents that appear on Google Patents but none of them seems to indicate that latent semantic indexing plays a vital role in how Google looks for synonyms and related words. They do talk about semantics and the use of phrase concurrency, but as far as LSI goes, there isn’t much to go off.

For those looking to overhaul their content strategy, you might’ve looked at LSI keyword generators or tools to help you find semantically related keywords. 

As we mentioned before, there is no science behind the application of using “LSI keywords”, given the fact that LSI is largely unlikely to be responsible for how Google uses and understands synonyms and related keywords.

The problem with these tools is that it doesn’t provide any information about how it’s generating these keyword ideas or what technology is being used to determine “yes, these are LSI keywords”. 

It has been thought these tools are operating on phrase-based indexing, rather than latent semantic indexing and this is generally where the confusion may lie. 

This might be because this concept may rely on phrases that are “valid” or “good”, and could take into consideration how frequently these phrases were used and if they might be related to each other.

When we tend to think of keywords we’re ranking for, spinning off as many synonyms as we can think of gives us the impression that we’re ticking off all boxes and this might be what these LSI tools are operating on – but again, this is not using LSI in nature.

The phrase-based indexing patent invented by Anna Lynn Patterson identifies phrases in documents on the internet and indexes them according to those particular phrases.

When a user submits a search query to the search engine, it attempts to provide the most relevant from its repository of information, whilst looking for phrases in the user’s search term. 

The information retrieval system will then rank the results it aims to show the user, using phrases to influence the ranking order (or which is the most relevant). 

It’s a very interesting patent and given there are several patents assigned to Google, it suggests this may be something that Google is fully utilising.

Forget LSI – Here’s What You Should Do Instead

The advice of using LSI keywords is to “help Google better understand the context of your pages”, but the reality is there are much more efficient ways to do this. 

Even in 2022, Google’s machine learning isn’t perfect but it most certainly isn’t using a patented technology that is older than I am.

It still needs a little assistance understanding what is on your page and the most ideal way to help them is through structured data.

Focus On Structured Data

Structured data, or commonly known as schema markup, is the use of organised information that crawlers for search engines such as Google and Bing digest to better understand the context of your web pages. 

So if you do want to implement some semantic technology, the use of structured data is something both Google understands and uses.

There are plenty of ways you can create organised data for your web pages and the list of different schema types are available on their website.

For eCommerce websites, you might want to consider using review rating markup to provide reviews on products that you might sell, as well as listing their price and stock availability.

When you implement schema to your web pages, you can sometimes trigger rich results. Rich results enhance the way your website looks amongst the search results, making it more enticing for users to click in comparison to your competitors. 

Google will show rich results on websites at its own discretion, so don’t be disheartened if they don’t show up for your website – the important thing is that you’re aiding their crawlers to understand what your web pages are about. 

An image of a rich result for nike.com in Google search results.

Make Use Of Word Vectors & Related Content

Word vectors are an important leap forward in advancing the ability to analyse relationships across words and sentences, by providing machines with more contextual information about a word’s meaning and potential relationship amongst a group of words than ever before.

We know that Google uses machine-learning, evident by the drop of RankBrain, which uses artificial intelligence to embed collections of words/written language into digestible mathematical entities – called vectors (or in other words, context vectors) that computers and machines are able to understand.

This indicates that Google may also be using word vectors as part of their algorithm to understand the nature of words, which makes sense when talking about context.

Context is important in the grand scheme of identifying intent from a user’s search, and as such, it’s important to provide words that accurately disclose the meaning or context of your pages.

When talking about related content, it’s important to distinguish that we are not talking about synonyms.

There are huge knowledge bases such as Wikipedia and encyclopaedia’s that exist, which are incredibly helpful resources in providing context vocabulary terms that can help Google better understand the meaning of your content.

If you do want to make use of real semantic search, writing about identify attributes that are contextually related to your parent topic is a step in the right direction.

For example, if you’re writing content about car racing, you might head on over to Wikipedia or a trusted knowledgebase and use related terms about famous tracks and famous car drivers which indicates to Google that you are immediately talking about car racing and not any other variation of racing.

Diversifying your content topic with related words is actively working in the right way.

Take Note Of The Concept Of Co-occurrence

As discussed earlier with the phrase-based indexing patent, the concept of word occurrence is becoming increasingly important as search engines try to understand how certain words relate to each other. 

If Google does indeed use phrase-based indexing, then the significance of these related words and how frequently they are mentioned with one another can help Google better understand the context of your pages. 

This is more than just using synonyms – this is about using phrases related to your topic. If you have a blog post that talks about “Australian politics”, you would expect words like “parliament house” and” prime minister” to be included in the document.

As such, the appearance of these phrases can also help predict the semblance of other phrases.

Google has also moved beyond “keyword-stuffing” over a decade ago, so make sure that you’re being quality-friendly with the number of related phrases a user might be searching for as you don’t want to dilute the quality of your content with unnecessary or unmeaningful phrases over and over.

Pay Attention To Technical SEO

Core Web Vitals or the “Page Experience Update” is set to drop in mid-2021 so now is the time to understand how to optimise your website properly. 

There is no telling what the weight of this ranking signal will be, but it is debated that it will be lightweight in nature, just like the SSL update. 

John Mueller also had some future advice for SEOs at SMX Virtual, stating that websites that are “technically better” will have a small advantage but overall, content is still king. 

We don’t necessarily know what 2023 will hold for SEOs but good advice is to keep on top of your technical SEO. This includes looking for things such as:

  • Ensuring your canonical URLs are set properly 
  • Setting noindex to low-quality or thin content pages that serve no purpose to the user
  • Mistakenly using parameters or session IDs 
  • Double-checking there are no crawling or indexing problems with your sitemap and robots.txt file
  • Triple checking that Googlebot can render your pages on mobile as Google move all websites to a mobile-first index.

The Wrap Up

LSI is technology that was used in the 80s and may not hold any real significance over how Google provides users with the most relevant search queries.

There isn’t any harm in using synonyms throughout your content for the purpose of enrichment, but to suggest that the use of LSI is responsible for how Google uses semantically related words and synonyms is not necessarily helpful or correct.

Dispelling all of the misinformation regarding LSI is extremely ideal if we want to create trust amongst our readers and digital marketers.

If you do want to make strides with semantic approaches, use methods that Google have documented and understand, such as structured data, related words, context and word vectors as well as making sure the technical aspects of your site are up to scratch.

The continued spread of false information will only continue to erode the valuable information that users will receive, and that’s not what our jobs as SEOs are about.

Get A FREE SEO Strategy.

If you’re looking to take your SEO campaign to the next level, or get an understanding of where your SEO currently sits, feel free to click below and we’ll help you get started.

All Rights Reserved

Scroll to Top

Let's move your business forward.

Your Digital Strategy Awaits

If you prefer to shoot us an email, send one across to [email protected] and we’ll get back to you as soon as possible!