Welcome back to another insightful blog topic! Today, we will discuss Natural Language Processing (NLP) and its impact on SEO.
I know you have heard about this term before, but since it’s super complicated, you may have shut your mind on it.
If that’s not the case, you are one of those SEO geeks out there who have an earnest desire to know more about the technical side of SEO.
Interestingly, NLP shouldn’t be considered part of technical SEO because there is more an on-page and content expert can do with it than a technical SEO guy.
So, the purpose of this article isn’t to throw a string of technical terms at you and keep you guessing, but instead to help even a non-technical SEO person understand this new (but old) term and use it to make their websites rank higher on Google.
If you are a content guy like me who also happens to work closely with SEOs, then this is one if directed at you.
I’ll give you some crucial answers so that next time you sit in a team meeting discussing NLP, you don’t let the SEO team dictate terms.
So, let’s do it.
If you already know what NLP is and how it has transformed, I recommend skipping to When did Google start using NLP in search.
NLP is a technology used in a variety of fields, including linguistics, computer science, and artificial intelligence, to make the interaction between computers and humans easier.
It works by processing and understanding a large quantity of user data to extract information so that it can be used for multiple purposes like translation, summarization, named entity recognition, relationship extraction, speech recognition, topic segmentation.
To put this into the perspective of a search engine like Google, NLP helps the sophisticated algorithms to understand the real intent of the search query that’s entered as text or voice.
It tries to understand the relationship between each word through a process called Masked-Language Modeling, wherein a few words within a query are used to generate possible answers, thereby self-transforming using the datasets it generates.
Origin of NLP
Artificial Intelligence has made deep inroads into our lives in amazing ways.
The Google Assistant or Siri that you use on the smartphone is the simplest example I can think of.
NLP is a sub-sect of Artificial Intelligence, but its history predates the last years of the second world war.
This was the time when bright minds started researching Machine Translation (MT).
The purpose of Machine Translation during those days was to convert one language into another (mostly Russian to English and vice versa.)
Rightly so because the war brought allies and enemies speaking different languages on the same battlefield.
However, NLP didn’t see many innovations post the world war, and in the late 1960’s it was almost dead.
It was in this period that the first-ever NLP Chatbot was invented at the Artificial Intelligence Laboratory of MIT.
The chatbot named ELIZA was created by Joseph Weizenbaum based on a language model named DOCTOR.
ELIZA was more of a psychotherapy chatbot that answered psychometric-based questions of the users by following a set of preset rules.
That means ELIZA is the unsung grandmother of most of those chatbots we see out there on the internet.
It was in the ’80s that NLP got a major breakthrough. With the increased popularity of computational grammar that uses the science of reasoning for meaning and considering the user’s beliefs and intentions, NLP entered an era of revival.
Then in the ’90s, NLP-based grammar tools and practical tests became popular and this paved the way for the revival of NLP.
During each of these phases, NLP used different rules or models to interpret and broadcast.
Symbolic NLP: Symbolic NLP was used in the early days (1950s-1990s). This was the time when there were very few innovations in the field. This was mostly because of constraints of data processing. Preset rules were defined and this model tried to understand the language by applying the rules to every single data set it confronts.
Statistical NLP: The technological innovations of the ’80s gave birth to machine learning algorithms. Since this period also saw systematic improvements in the computational capabilities, NLP detached itself from the handwritten symbolic model and used statistical models. Specifically speaking about Google, these were the days when the number of links and the number of keywords alone decided the SERP rankings.
Neural Networks: Even though the statistical model was better than its predecessor, it required a lot of engineering resources to fulfill the task. That’s when Neural Networks became the new method and it uses machine learning algorithms and semantic graphs to determine the pages fit to rank on the top positions of Google. Neural network-based NLP became popular starting in 2015, and with it came better quality processing.
The neural network-based NLP model enabled Machine Learning to reach newer heights as it had better understanding, interpretation, and reasoning capabilities.
It even enabled tech giants like Google to generate answers for even unseen search queries with better accuracy and relevancy.
Neural Network-based NLP uses word embedding, sentence embedding, and sequence-to-sequence modeling for better quality results.
That means NLP isn’t something that Google invented. (:
Since SEOs hear mostly about NLP from Google, there is a misconception that the search engine giant is its father.
Rather than that, most of the language models that Google comes up with, such as BERT and LaMDA, have Neural Network-based NLP as their brains.
NLP vs. NLU vs. NLG -What’s the Difference?
When you explore NLP, you will certainly come across two other terms NLU and NLG. The three of them may look similar, but they aren’t all the same.
If you are confused about these, you aren’t alone. Let me break them down for you and explain how they work together to help search engine bots understand users better.
NLP – Natural Language “Processing”
NLU – Natural Language “Understanding”
NLG – Natural Language “Generation”
NLP is a combination of NLU and NLG that gets search engines like Google to recognize and comprehend user queries in order to come up with relevant answers. Simply put,
NLP= NLU + NLG
Let’s take a general user query, for instance.
“Can I play football right now?”
So, here is how the NLP-powered bot answers this particular query.
While the idea here is to play football instantly, the search engine takes into account many concerns related to the action. To name one, the weather outside is an important consideration. Yes, if the weather isn’t right, playing football at the given moment is not possible.
So, the search engine needs access to structured data, including intent and entities to understand the query. What are they?
Intent is the action the user wants to perform while an entity is a noun that backs up the action. As per the above example – “play” is the intent and “football” is the entity.
However, a change in intent or entity can prompt different search results.
For example, you enter, “Can I watch a cricket match?”.
The answer will be positive because you can watch a cricket match played elsewhere on TV. The weather at your place isn’t going to stop you from watching a cricket match.
So, bringing it all together:
NLU comprises comprehending the user query based on grammar and context.
NLP transforms the text into structured data while deciding on the intent and entities.
NLG generates text from the structured data to be understood by users.
So, what ultimately matters is providing the users with the information they are looking for and ensuring a seamless online experience. This is precisely why Google and other search engine giants leverage NLP.
When Did Google Start Using NLP in their Algorithm?
Google has been at the forefront of implementing the capabilities of NLP in search and the research team within the California-based company has been doing the research since 2011 and with the popularity of the neural network model increasing in 2015, its research and development wing was on full swing to develop a language model based on NLP.
However, it wasn’t until 2019 that the search engine giant was able to make a breakthrough. BERT (Bidirectional Encoder Representations from Transformers) was the first NLP system developed by Google and successfully implemented in the search engine. BERT uses Google’s own Transformer NLP model, which is based on Neural Network architecture.
The announcement of BERT was huge, and it said 10% of global search queries will have an immediate impact. In 2021, two years after implementing BERT, Google made yet another announcement that BERT now powers 99% of all English search results.
After BERT, Google announced SMITH (Siamese Multi-depth Transformer-based Hierarchical) in 2020, another Google NLP-based model more refined than the BERT model. Compared to BERT, SMITH had a better processing speed and a better understanding of long-form content that further helped Google generate datasets that helped it improve the quality of search results.
Then in the same year, Google revamped its transformer-based open-source NLP model to launch GTP-3 (Generative Pre-trained Transformer 3), which had been trained on deep learning to produce human-like text. Even though it was the successor of GTP and GTP2 open-source APIs, this model is considered far more efficient.
In 2020, Google made one more announcement that marked its intention to advance the research and development in the field of natural language processing. This time the search engine giant announced LaMDA (Language Model for Dialogue Applications), which is yet another Google NLP that uses multiple language models it developed, including BERT and GPT-3.
LaMDA is touted as 1000 times faster than BERT, and as the name suggests, it’s capable of making natural conversations as this model is trained on dialogues. Google is boastful about its ability to start an open-ended conversation.
What this means is that LaMDA is trained to read and understand many words or even a whole paragraph, and it can understand the context by looking at how the words used are related and then predict the next words that should follow.
BERT Update and How It Works
Historically, language models could only read text input sequentially from left to right or right to left, but not simultaneously.
However, BERT is designed to read in both directions simultaneously. Using this bidirectional capability, BERT is pre-trained on two related NLP tasks:
- Masked Language Modelling
The Masked Language Model (MLM) works by predicting the hidden (masked) word in a sentence based on the hidden word’s context.
The objective of the Next Sentence Prediction training program is to predict whether two given sentences have a logical connection or whether they are randomly related.
With the release of the BERT update, Natural Language Processing (NLP) has become the key ingredient for every SEO campaign.
Lately, if you have observed, Google has been focusing on launching broad core algorithm updates quite frequently.
Unlike the other updates, such as the recent Link Spam Update and the Page Experience update, a Broad Core Update has a bigger impact, and the implications are huge.
According to the official Google blog, if a website is hit by a broad core update, it doesn’t mean that the site has some SEO issues. The search engine giant recommends such sites to focus on improving content quality.
An official tweet that’s often copy-pasted after the launch of a broad core update says, “There’s no “fix” for pages that may perform less well other than to remain focused on building great content. Over time, it may be that your content may rise relative to other pages.”
What this means is that BERT and SMITH, which are now part of the core algorithm of Google, are constantly learning stuff based on the dataset provided by the users every second and are offering recommendations to Google about the type of content they want to consume.
Such recommendations could also be about the intent of the user who types in a long-term search query or does a voice search.
While Google says the site with higher content relevancy gets the upper hand with broad core updates, it’s an indication that the search engine is keeping a closer eye on the search query and the intent of the content it ranks on the top positions.
Since the users’ satisfaction keeps Google’s doors open, the search engine giant is ensuring the users don’t have to hit the back button because of landing on an irrelevant page.
When Google launched the BERT Update in 2019, its impact wasn’t huge, with just 10% of search queries seeing the impact.
However, that was just the beginning of something big.
BERT being a machine learning algorithm has self-learning capabilities that get improved over time as more new dataset flows in.
Talking about new datasets, Google has confirmed that 15% of search queries it encounters are new and used for the first time. This is more so with voice search, as people don’t use predictive search.
According to Google, BERT is now omnipresent in search and determines 99% of search results in the English language.
With more datasets generated over two years, BERT has become a better version of itself.
Its ability to understand the context of search queries and the relationship of stop words makes BERT more efficient.
BERT can be called the most significant development from Google towards the advancement of their linguistic AI capabilities.
How is Google NLP Shaping Up Search Results?
Google sees its future in NLP, and rightly so because understanding the user intent will keep the lights on for its business. What this also means is that webmasters and content developers have to focus on what the users really want.
Thinking of content in a keyword perspective is heading north and what’s gaining more acceptance is content that’s specific, descriptive, and the one that answers the pressing questions that are boggling in the minds of users while they do the Google search.
To put it in layman’s terms, Google is not looking for individual phrases within your content but rather tries to find the context and meaning of the sentences to determine if it’s better than the results that are already ranking on the top positions.
Interestingly, BERT is even capable of understanding the context of the links placed within an article, which once again makes quality backlinks an important part of the ranking.
As a matter of fact, optimizing a page content for a single keyword is not the way forward but instead, optimize it for related topics and make sure to add supporting content.
With that in mind, depending upon the kind of topic you are covering, make the content as informative as possible, and most importantly, make sure to answer the critical questions that users want answers to.
Impact of NLP on Backlinks
For sure, the quality of content and the depth in which the topic is covered matters a great deal, but that doesn’t mean that the internal and external links are no more important.
What NLP and BERT have done is give Google an upper hand in understanding the quality of links – both internal and external.
With NLP, Google is now able to determine whether the link structure and the placement are natural. It understands the anchor text and its contextual validity within the content.
What that means is if the sentiment around an anchor text is negative, the impact could be adverse. Adding to this, if the link is placed in a contextually irrelevant paragraph to get the benefit of backlink, Google is now equipped with the armory to ignore such backlinks.
This means you cannot manipulate the ranking factor by placing a link on any website. Google, with its NLP capabilities, will determine if the link is placed on a relevant site that publishes relevant content and within a naturally occurring context.
What Google is aiming at is to ensure that the links placed within a page provide a better user experience and give them access to additional information they are looking for.
For example, check out this natural and unnatural way of using anchor text.
Natural Anchor: If you are looking for the best sofa set for your house, check out the latest sofa designs offered by Bran Interiors.
Unnatural Anchor: There are times when you feel tired while driving; that’s when you really want the driving seat to transform into a sofa with the latest design that offers great comfort.
As you can see, the context of the second example is cooked up to place the anchor text, adding to that the whole article may be talking about “5 Tips to Ensure an Awesome Driving Experience.”
With advancements made in NLP, Google can trace such attempts to manipulate the rankings. So, the next time you build backlinks, keep these factors in mind before shelling out your hard-earned money.
NLP & Syntax Analysis
Another aspect of Google’s NLP algorithm is its ability to do Syntax Analysis.
It’s a process wherein the engine tries to understand a content by applying grammatical principles.
Basically, it tries to understand the grammatical significance of each word within the content and assigns a semantic structure to the text on a page. Here is how it looks.
Google NLP and Content Sentiment
Would you believe it if I told you that Google can now understand the sentiment represented by your content?
It’s true and the emotion within the content you create plays a vital role in determining its ranking. Google’s GPT3 NLP API can determine whether the content has a positive, negative, or neutral sentiment attached to it.
The NLP API does this by analyzing the text within a page and determining the kind of words used.
If it finds words that echo a positive sentiment such as “excellent”, “must read”, etc., it assigns a score that ranges from .25 – 1.
If the text uses more negative terms such as “bad”, “fragile”, “danger”, based on the overall negative emotion conveyed within the text, the API assigns a score ranging from -1.00 – -0.25.
Additionally, it can also assess neutral statements the same way but the score ranges from -0.25 – 0.25.
SurferSEO did an analysis of 17500 pages that ranks in the top 10 positions to find how sentiment impacts the SERP rankings and if so, what kind of impact they have.
The data revealed that 87.71% of all the top 10 results for more than 1000 keywords had positive sentiment whereas pages with negative sentiment had only 12.03% share of top 10 rankings. Interestingly, neutral sentiment got just 0.26%.
This points to the importance of ensuring that your content has a positive sentiment in addition to making sure it’s contextually relevant and offers authoritative solutions to the user’s search queries.
NLP and Entity Recognition
Google is pushing more website owners to start implementing Structured Data Markup for the sites.
If you ever wondered why, the reason is to help its algorithm recognize the entities based on unique identifiers associated with each.
Recently, Google published a few case studies of websites that implemented the structured data to skyrocket their traffic.
In cases where the Schema or Structured data is missing, Google has trained its algorithm to identify entities with the content for helping it to classify.
One of the interesting case studies was that of Monster India’s which saw a whooping 94% increase in traffic after they implemented the Job posting structured data.
With entity recognition working in tandem with NLP, Google is now segmenting website-based entities and how well these entities within the site helps in satisfying user queries.
An entity is any object within the structured data that can be identified, classified, and categorized.
Examples of entities are:
What NLP does is extract this information from your pages and classify it under categories that best match your entities.
Once a user types in a query, Google then ranks these entities stored within its database after evaluating the relevance and context of the content.
The entity or structured data is used by Google’s algorithm to classify your content.
This is made possible through Natural Language Processing that does the job of identifying and assessing each entity for easy segmentation.
If you want to see Google NLP in action, visit Google’s official NLP demo page and check with your own content.
How to Adapt Your SEO Strategy for Future NLP-Based Algorithms?
NLP is here to stay and as SEO professionals, you need to adapt your strategies by incorporating essential techniques that can help Google gauge the value of your content based on the query intent of the target audience.
Keyword Research 2.0
With NLP in the mainstream, we have to relook at the factors such as search volume, difficulty, etc., that normally decide which keyword to use for optimization.
What matters now is to look for the search intent.
The simplest way to check it is by doing a Google search for the keyword you are planning to target.
Even though the keyword may seem like it’s worth targeting, the real intent may be different from what you think.
One of the most hit niches due to the BERT update was affiliate marketing websites. With the content mostly talking about different products and services, such websites were ranking mostly for buyer intent keywords.
However, with BERT, the search engine started ranking product pages instead of affiliate sites as the intent of users is to buy rather than read about it.
Additionally, such websites mostly wrote about the pros alone and that really didn’t help the users with the buying decision.
Many of the affiliate sites are being paid for what is being written and if you own one, make sure to have impartial reviews as NLP-based algorithms of Google are also looking for the conclusiveness of the article.
For websites in other niches that saw a heavy loss in keyword rankings, I suggest looking at the competitors who outranked you.
Do an in-depth content audit and fill the missing subtopics in the content.
Something that we have observed in Stan Ventures is that if you have written about a happening topic and if that content is not updated frequently, over time, Google will push you down the rankings.
What this means is that you have to do topic research consistently in addition to keyword research to maintain the ranking positions.
Links Must Be Contextual and Serve a Definitive Purpose
We know that links are one of the most talked-about subjects within SEO. One reason for this is due to Google’s PageRank algorithm weighing sites with quality backlinks higher than others with fewer ones.
However, with NLP Algorithms put into action, especially with BERT, links are being put into further scrutiny.
Of course, the relevance and authority of the site linking still matter but what Google is trying to achieve is to understand whether the link is serving the intended purpose.
Since the NLP algorithms analyze sentence by sentence, Google understands the complete meaning of the content.
This means, if the link placed is not helping the users get more info or helping him/her to achieve a specific goal, despite it being a dofollow, in-content backlink, the link will fail to help pass link juice.
The same theory applies to internal links as well.
So, if you are doing link building for your website, make sure the websites you choose are relevant to your industry and also the content that’s linking back is contextually matching to the page you are linking to.
Also, there are times when your anchor text may be used within a negative context. Avoid such links from going live because NLP gives Google a hint that the context is negative and such links can do more harm than good.
Competitor Analysis 2.0 (Intent Specific)
Another strategy that SEO professionals must adopt to incorporate NLP compatibility for the content is to do an in-depth competitor analysis.
Unlike the current competitor analysis that you do to check the keywords ranking for the top 5 competitors and the backlinks they have received, you must look into all sites that are ranking for the keywords you are targeting.
Remember the example of the affiliate site that we discussed in keyword research?
If Google finds the intent of a keyword is to buy a product, none of the competing affiliate sites will rank.
So, what I suggest is to do a Google search for the keywords you want to rank and do an analysis of the top three sites that are ranking to determine the kind of content that Google’s algorithm ranks.
In addition to updating your content with the additional keywords that the top ranking sites have used, try to cover the topic more in-depth with more information and data that cannot be replicated by others.
What this essentially means is Google’s NLP algorithms are trying to find a pattern within the content that users browse through most frequently. When you update the content by filling the missing dots, you can join the league of sites that have the probability to rank.
However, even this doesn’t guarantee top positions.
Once the gap is filled, make the content stand out by including additional info that others aren’t providing and follow the SEO best practices that you have been following to date.
EAT (Expertise Authoritativeness and Trustworthiness)
Last but not least, EAT is something that you must keep in mind if you are into a YMYL niche. Any finance, medical, or content that can impact the life and livelihood of the users will have to pass through an additional layer of Google’s algorithm filters.
What this means is that in spite of achieving top quality content, if you fall into the YMYL (Your Life Your Money) niches, the current authority of the site, the expertise of the author, and the trustworthiness of the brand can determine the rankings heavily.
As we discussed above, when talking about NLP and Entities, Google understands your niche, the expertise of the website, and the authors using structured data, making it easy for its algorithms to evaluate your EAT.
That brings us to the wrapping up the session and to summarize this voluminous post, NLP is a strategy used by Google, like many other companies, to better equip its algorithms to understand the content within a page and the context surrounding it by identifying, classifying and categorizing entities and their relevance to the search queries entered by the users.