Advertising Algorithms Content Marketing Crawlers Link Building Technical

15 mins

Spam

31 January 2023

What is Spam?

Spam refers to a broad range of unwanted pop-ups, links, data and emails that we face in our daily interactions with the web. Spam’s namesake is the (now unpopular) luncheon meat that was often unwanted but ever present. Spam can be simply unwanted, but it can also be harmful, misleading and problematic for your website in a number of ways.

A quick history of spam

Internet spam became prevalent with email spam. Whether it’s newsletters, strange emails from “long lost relatives” with huge inheritances that are desperate to give you a few million pounds, or other unwanted advertisements, email spam became increasingly popular from the 1980s onwards. Fortunately, most mail systems (Gmail, Outlook etc.) now have quite good built in spam filters, to get rid of the more egregious types.

From email spam, the spam phenomena grew into pop-up ads on our desktop browsers. Many of these invasive ads have since been blocked by computer antivirus programs and ad blockers.

As search engines began to dominate the market, link spam, keyword stuffing and doorway pages took over. In a nutshell, unscrupulous SEOs cared less about the quality of content, and more about getting their website found, and used web spam to game search engines into giving artificially high rankings.

Any spam tactics in SEO are known as black hat SEO, and the leading search engines, Google and Bing, have made it their mission to eradicate it. Google releases thousands of algorithm updates every year. A number of these are major updates, or even core algorithm updates (which is when Google updates its main algorithm), which pack a very big punch. Whereas many of these updates are designed to make the search experience better (which, of course, often deal a blow to spam as a side effect, as by rewarding good content Google inadvertently penalises bad content), many are also specifically aimed at devaluing, and sometimes even penalising spam. In 2018 alone Google penalised more than 180 million websites for spam offences.

Although there have been a great many significant spam-targeting algorithm updates over the years, perhaps the three most famous were the Penguin update, the Panda update and the SpamBrain update.

In 2020, Google’s systems found 40 billion spammy pages every day. And it gets better at it every year. In 2021, Google managed to identify six times more spam than they did in 2020 and in 2022, SpamBrain detected five times more spam than in 2021, and 200 times more spam than in 2018!

The truth is that, for many years, spammers thought that they could stay one step ahead of Google. Whereas this is still possible to gain good results via spam (as is evidenced by the fact that Google is still catching so much spam), it now has a shelf life and, chances are, if you’re using web spam to boost your performance in search, you’ll get caught sooner or later. And that could be devastating for you. Put simply, the best way to avoid a spam related manual action (penalty) or devaluation is to not spam in the first place.

Types of SEO-related spam

Google outlines its spam policies in its carefully named, wait for it… “Spam Policies” and “Search Essentials” guidelines. However, it also reserves the right to penalise anything not listed within its guidelines, but that it feels breaches the spirit of the guidelines.

The most common types of spam are:

Cloaking

When a website is cloaking, it’s presenting different content to search engines than it is to users. Google gives cloaking examples as:

“Showing a page about travel destinations to search engines while showing a page about discount drugs to users
Inserting text or keywords into a page only when the user agent that is requesting the page is a search engine, not a human visitor”

Doorway Pages

Doorway pages are pages that provide little user value, but instead seek to use all of the different functions on a page (URLs, title tags, heading tags, alt tags, content etc.) to rank for a narrow set of keywords. A website can have hundreds, thousands or even millions of doorway pages.

For example, if a website had a separate page for:

www.examplewebsite.com/hairdresser

www.examplewebsite.com/hairdressers

www.examplewebsite.com/hair-dresser

www.examplewebsite.com/hair-dressers

www.examplewebsite.com/hairdresser-london

www.examplewebsite.com/hairdressers-london

www.examplewebsite.com/hairdresser-new-york

www.examplewebsite.com/hairdressers-new-york

etc.

These would be doorway pages. They are almost identical to one another, and clearly one of their main uses is to target subtle variations of the same keywords.

Google gives a few examples of doorway pages:

“Having multiple websites with slight variations to the URL and home page to maximise their reach for any specific query
Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page
Pages generated to funnel visitors into the actual usable or relevant portion of your site(s)
Substantially similar pages that are closer to search results than a clearly defined, browseable hierarchy”

Internal search pages (web pages automatically created by internal search engines) are a quick way to (sometimes intentionally, other times unintentionally, but the outcome is still the same) create thousands of doorway pages. This is because many internal search engines create a new and unique page every time a searcher looks for something that’s never been searched for before within that engine. That page will then be stored in the website’s database, ready to be surfaced the next time someone searches that same query. So, if “hairdressers london” has never been searched for before, the internal search engine will create the page

www.examplewebsite.com/hairdresser-london

and then, if someone searches the plural, “hairdressers London”, the internal search engine will then create the different page:

www.examplewebsite.com/hairdressers-london

In this way, popular websites with a large number of internal searches can quickly generate a huge number of doorway pages. To make matters worse, these pages, by nature of being dynamically generated, often have thin content, another of Google’s bugbears!

Whereas some web publishers know that their internal search pages are doorway pages, others use this strategy unknowingly. Other times doorway pages are very much intentional. For instance, you’ll often see trades website’s with a huge list of local landing pages:

/plumber-london

/plumber-kensington

/plumber-chelsea

/plumber-wandsworth

/plumber-shoreditch

/plumber-islington

etc.

Again, these are doorway pages and subject to penalisation from Google.

Hacked content

Hacked content refers to unauthorised and malicious content placed on a website by hackers due to security vulnerabilities. This content not only provides poor search results (as it can mean that the content on a web page is compromised and stuffed full of low quality or junk material) but also poses significant risks to users.

For instance, hackers can find a good website that has a security vulnerability in a WordPress plugin. They may then exploit this vulnerability to inject hidden code and links, compromising the integrity of the website or even allowing them to capture sensitive user data, including credit card data. As a result, users who access the compromised site may unknowingly download malware, have their personal information stolen, or be redirected to phishing sites.

Even though hacked content is rarely intentional by the web publisher (it’s usually hackers that are responsible, not the site owners themselves!) Google will still penalise websites that have been hacked, until the web publisher can show that they’ve cleaned up the hack and removed the malware.

Hidden text and links

Hidden text and links is where a black hat SEO places content on a webpage that’s visible to search engines but invisible to users.

For example, a web publisher might repeat the same keyword one hundred times on a web page, but make the text tiny, or render it as white text on a white background. The search spiders will be able to read the text within the html of the page, but it will be all but invisible to the human eye.

Keyword stuffing

Keyword stuffing is the practice of excessively and unnaturally overloading a web page with keywords in an attempt to manipulate search rankings. For instance, imagine a website dedicated to online gaming that repeatedly lists popular game-related keywords, such as “best gaming headphones,” “top gaming keyboards,” and “cheap gaming laptops,” without providing any substantial content or relevant information. This technique aims to artificially inflate the site’s visibility in search results, even though the content lacks value and readability for users.

Keyword stuffing doesn’t just occur in the main content of a web page, though. Black hat SEOs will also stuff keywords into important on-page elements such as title tags, heading tags, alt attributes and noscripts. Keyword stuffing is also seen when SEOs over optimise anchor text. For instance, they may make sure that every link that points back to their website has the anchor text “Hairdressers London”.

Link spam

Link spam is any practice that builds links with no user value. For instance:

Guest blogging link spam: If you pay a guest blogger to place an article on their site, but the main point of that article is not to be read by genuine users, but instead to link back to your website.

Link networks: A network of websites that all link to one another to manipulate search rankings of the network as a whole. For instance, website A links to website B which links to website C which links to website D which links to website A.

Link farms: A link farm is a website that exists solely to generate links to other websites. Back in the day, link farms often looked like directories of varying quality, but very few users would ever use them like they would a genuine directory such as Yelp. Instead, they existed to manipulate search rankings to the websites that they linked to. These days, link farms tend to look a little more sophisticated than directories, and many blogs that sell links could be interpreted as link farms.

Comment section links: This is where black hat SEO will go into the comment section of articles published on prominent websites, write a comment in the comment section and contain a link back to their target website within that comment.

Paid links: In essence, these are any backlinks that a web publisher pays for, that the recipient does not mark as “sponsored” or “nofollow” (Google prefers them to be marked “sponsored” over “nofollow”. Google relies on links as a form of quality control. If a web publisher like, say, the New York Times, writes an article about beauty tips, and within that article links to a great hair salon, then Google will see that as a human recommendation from an authoritative source (the New York Times) to that salon. It will reward the salon with boosted rankings.

If, however, the New York Times has been financially incentivised to post that link, then it’s not a genuine quality endorsement and the linked to site or service may well not be as good as the link seems to suggest. If you pay for links, and don’t mark it, this is seen as spam and subject to devaluation or penalisation.

Note: it is, of course, sometimes very hard for Google to distinguish between when a link has been paid for or not, but it is getting better at making educated guesses. As a general rule of thumb, the safest form of link building remains those built via genuine online PR efforts.

Machine-generated traffic

Google defines machine generated traffic as:

“Machine-generated traffic consumes resources and interferes with our ability to best serve users. Examples of automated traffic include:

Sending automated queries to Google
Scraping results for rank-checking purposes or other types of automated access to Google Search conducted without express permission”

Malware and malicious behaviours

Google actively checks websites for hosting malware or unwanted software that harms users. Malware includes software designed to harm devices, steal sensitive information, or perform malicious activities, such as ransomware, spyware, or viruses. It’s usually (but not always) inserted into a site by hackers who have managed to exploit a vulnerability in your site, for example, in an outdated WordPress plugin, although of course sometimes thieves will intentionally set up a website and fill it with malware to steal users’ information.

Unwanted software negatively affects the browsing experience, often by displaying intrusive ads, redirecting users to unrelated websites, or collecting personal information without consent. Even if you are unaware that your site has been hacked, and are not responsible for the hack, Google can, and will still penalise or devalue you for it.

Misleading functionality

Misleading functionality involves creating deceptive features or services on a website to manipulate search rankings. For example, if a website claims to offer a free credit score checker, but upon using this tool, users are redirected to unrelated and potentially harmful advertisements. These fake generators or services attract users with their promised functionality but instead lead them to deceptive ads or attempt to gather their sensitive data. Google provides examples:

“A site with a fake generator that claims to provide app store credit but doesn’t actually provide the credit
A site that claims to provide certain functionality (for example, PDF merge, countdown timer, online dictionary service), but intentionally leads users to deceptive ads rather than providing the claimed services”

Scraped content

Scraped content refers to websites that primarily rely on content taken from other sources without providing substantial additional value. For instance, a website that aggregates news articles from reputable publishers without proper attribution or permission and that doesn’t add anything extra of value to these articles. Instead of curating or adding unique insights, the site merely republishes the content, claiming it as its own, often resulting in duplicate or low-quality information.

Not only does this often result in low quality content and duplicate content being visible in the SERPs (as Google might accidentally show the same article twice in the same top 10 results of the Google listings, this reducing the diversity of those listings), but it can also cause serious copyright and trademark infringement issues.

Sneaky redirects

Sneaky redirects are when a black hat SEO misleads users or search engines by redirecting them to different or unexpected content without their knowledge or consent.

For example, if there was a website promoting a popular game. But, when you go to a page on that website that claims to host the game, or to provide guides or cheats for the game, you’re instead directed to adult content or an online gambling platform. While legitimate redirects are acceptable for site moves or consolidation, misleading redirects aim to deceive users, leading them to irrelevant or potentially harmful destinations.

Spammy automatically-generated content

Spammy automatically-generated content refers to content generated programmatically without adding value, solely for the purpose of manipulating search rankings.

For instance, a website that generates nonsensical blocks of text filled with keywords, often using automated translation tools to create variations in different languages. These auto-generated pages serve no meaningful purpose for users, contributing to cluttered search results and a poor user experience.

Thin affiliate pages

Thin affiliate pages lack original content and value, relying on copied product descriptions, reviews, or specifications without adding unique information or insights.

For example, if a website acts as an affiliate for an online retailer, but instead of providing comprehensive product reviews, comparisons, or additional resources, it simply copies and pastes the manufacturer’s descriptions and specifications. These thin affiliate pages do not offer users any valuable or distinct information, and often serve as mere conduits to generate affiliate revenue without contributing original insights. These would be considered thin affiliate pages.

User-generated spam

User-generated spam occurs when individuals or automated systems add spammy content through platforms intended for user contributions, such as forum posts, comments sections, or file uploads.

For example, if the Guardian newspaper had less robust comments section oversight. Some users might repeatedly post unrelated advertisements, links to malicious websites, or nonsensical messages about purchasing Bitcoin, Viagra or some get rich quick scheme. Site owners must implement measures to prevent such abuse, such as proper comments section oversight (like, FYI, the Guardian has) and maintain a positive user experience.

Thin content

Thin content refers to web pages or sections of a website that lack substantial or meaningful content. It’s characterised by a minimal amount of text, often containing little to no unique information or value to the user. They are often found in doorway pages.

For instance, going back to that plumber example earlier, if there is a website with 100 different local landing pages for:

/plumber-london

/plumber-kensington

/plumber-chelsea

/plumber-wandsworth

and many more, each page might say something pretty useless, like:

“If you live in Kensington, you may need a plumber. If your central heating breaks down and you live in St Edward’s Square, or if your pipes burst and you live on the Gloucester Road, then a Kensington plumber is the thing you need!”

Thin content doesn’t always need to keyword stuff like the above example, but it often does. Also, many websites have similar pages, but with no written content at all! Just, for example, an optimised URL (/plumber-kensington), title tag, (“Kensington Plumber”) and Heading Tag (“Kensington Plumber”). This lack of written content altogether also represents thin content. Even though Google does not list thin content within its spam policies, historically web spam related messages from Google (that can, but don’t always, appear in Search Console to confirm a penalty) have often cited thin content as the reason for penalisation .

How to get over a spam-related penalty

Getting over a spam-related manual action, or a quality-related algorithmic devaluation, can be very time consuming. The most important thing to bear in mind is to always make sure that your content is written with the user as the priority, not the search engine. Make sure that your website’s content is always guided by E-E-A-T (Experience-Expertise-Authority-Trust) principles.

Google, in its guidelines, says that site owners should always ask themselves the following questions when publishing content:

“Content and quality questions:

Does the content provide original information, reporting, research, or analysis?
Does the content provide a substantial, complete, or comprehensive description of the topic?
Does the content provide insightful analysis or interesting information that is beyond the obvious?
If the content draws on other sources, does it avoid simply copying or rewriting those sources, and instead provide substantial additional value and originality?
Does the main heading or page title provide a descriptive, helpful summary of the content?
Does the main heading or page title avoid exaggerating or being shocking in nature?
Is this the sort of page you’d want to bookmark, share with a friend, or recommend?
Would you expect to see this content in or referenced by a printed magazine, encyclopedia, or book?
Does the content provide substantial value when compared to other pages in search results?
Does the content have any spelling or stylistic issues?
Is the content produced well, or does it appear sloppy or hastily produced?
Is the content mass-produced by or outsourced to a large number of creators, or spread across a large network of sites, so that individual pages or sites don’t get as much attention or care?

Expertise questions

Does the content present information in a way that makes you want to trust it, such as clear sourcing, evidence of the expertise involved, background about the author or the site that publishes it, such as through links to an author page or a site’s About page?
If someone researched the site producing the content, would they come away with an impression that it is well-trusted or widely-recognized as an authority on its topic?
Is this content written by an expert or enthusiast who demonstrably knows the topic well?
Does the content have any easily-verified factual errors?”

Adhering to the above is the best way to ensure that you recover from most penalties or devaluations. Remove the offending content, and, if there’s a genuine user need for content on the same topic (i.e. if the content is not simply there to bulk out a doorway page or similar, in which case remove it and keep it removed), rewrite it in line with the above questions. Slowly you may see your site start to recover as Google shifts from viewing you as a spammy web publisher worthy of demotion, to being an expertise in your field worthy of promotion.

Other advice

Understand the Penalty: The first step towards recovery is understanding the nature of the penalty. Google, for instance, issues manual or algorithmic penalties, which can result from actions such as keyword stuffing, hidden text, paid links, or thin content. It’s important to identify the specific reason behind the penalty, as it will help you formulate an appropriate recovery strategy.
Conduct a Comprehensive Audit: Perform a thorough audit of your website to identify any spam-related elements. Look for suspicious backlinks, duplicate content, keyword stuffing, or any other SEO practices that violate search engine guidelines. Several online tools can assist you in identifying and analysing these issues. Make a detailed list of the problematic areas to address during the recovery process.
Clean Up Your Act: Once you have identified the spam-related elements, it’s time to clean up your website. You may wish to consider removing or disavowing toxic backlinks that are causing the penalty. Be extremely careful here. Gary Illyes and John Mueller, at the time both senior personnel at Google, have said that the disavow tool hurts more web publishers than it helps. To quote Mueller: “When in doubt, leave disavow out”.
Since Penguin 4.0, Google usually does not penalise websites that engage in link spam, but instead simply ignore the spammy links and pass no link juice, or ranking benefit, through them. This can look like a penalty, as, if your website was ranking well because of spammy links, when Google starts to ignore those spammy links your website will drop in rankings. But, it is not being actively suppressed by Google, it’s instead just losing the link juice that it was getting via manipulative practices. Google does still penalise websites engaged in some of the more egregious or en masse forms of link spam. If you really do wish to use the disavow tool, for instance, if you’ve received a message via Search Console to confirm that you’ve received a manual action (penalty), and not an algorithmic devaluation, related to link spam, you’ll want to seriously consider the disavow tool. In such an instance, we strongly recommend getting an SEO agency to do any disavowing for you, and make sure to vet them well and ensure that they have a lot of experience with its careful use. This is a very delicate thing to say, but we recommend using agencies that do not outsource their work to countries such as India or China. In our experience, the norm in such counties is to play fast and loose with the disavow tool! That doesn’t apply to all agencies there (there are some fantastic agencies that hail from that part of the world). But, your chances of being lumped with someone that is less qualified or indeed overtly spammy would be amplified.
Request a Reconsideration: If you received a manual penalty notification, you can submit a reconsideration request to Google, once you’ve made the necessary changes. In your reconsideration request, clearly explain the actions you’ve taken to rectify the spam-related issues and provide any supporting documentation if available. Be patient as it may take time for search engines to review your request. And make sure that you really have rectified the problem, in its whole. Your site will be put under intense review by a member of Google’s quality team, and, if they feel you’ve kept some of the offending content etc., they won’t green light your request, which can add a lot of extra time to the reconsideration process.
Focus on Quality Content and User Experience: To regain the trust of search engines and users, focus on delivering high-quality content and improving user experience on your website. Create informative, engaging, and shareable content that adds value to your target audience. Enhance your website’s usability, mobile-friendliness, and page load speed. Make sure that it passes the core web vitals tests. By demonstrating your commitment to providing a positive user experience, you increase your chances of recovering from the penalty.
Build a Strong Backlink Profile: The only way to recover rankings lost from a link based manual action or algorithmic devaluation, is to build a strong and natural backlink portfolio. Backlinks should be earned via building amazing guides and resources on your site, so that other sites naturally want to link to it. Consider a strong online PR campaign, so that major news outlets, blogs, magazines and other authoritative sources cite you as an expert, speak about the interesting things you’re up to, and link to you in the process! High quality newspaper links are still some of the best out there in terms of authority building. Although many news outlets still blanket nofollow their outgoing links, these can still be really helpful. Whereas Google used to ignore nofollow links full stop, they’ve now confirmed that they consider them to be hints. At Go Up, we’ve had cases where a website has rocketed up the rankings, and the only thing that has happened has been that they’ve received a nofollow link from a major newspaper or magazine!
Monitor and Adapt: Once you’ve recovered from a spam-related penalty, it’s important to stay vigilant and monitor your website’s performance regularly. Keep an eye on your backlink profile, traffic patterns, and search engine rankings. Continuously adapt your SEO strategy to align with search engine guidelines and industry best practices. Remember, you can spend a lot of time building a house. But, if you build the foundations on sand, one day it’ll blow over and you’ll have to start again. If you build it on rock, it will stand the test of time. Base your SEO strategy on building the best possible content, that answers user questions and serves an intent. And then undertake a strong digital PR campaign to make sure that that content is talked about and linked to by authoritative sources, and you’ll be in the best possible shape!