Types of SEO-related spam
Google outlines its spam policies in its carefully named, wait for it… “Spam Policies” and “Search Essentials” guidelines. However, it also reserves the right to penalise anything not listed within its guidelines, but that it feels breaches the spirit of the guidelines.
The most common types of spam are:
When a website is cloaking, it’s presenting different content to search engines than it is to users. Google gives cloaking examples as:
- “Showing a page about travel destinations to search engines while showing a page about discount drugs to users
- Inserting text or keywords into a page only when the user agent that is requesting the page is a search engine, not a human visitor”
Doorway pages are pages that provide little user value, but instead seek to use all of the different functions on a page (URLs, title tags, heading tags, alt tags, content etc.) to rank for a narrow set of keywords. A website can have hundreds, thousands or even millions of doorway pages.
For example, if a website had a separate page for:
These would be doorway pages. They are almost identical to one another, and clearly one of their main uses is to target subtle variations of the same keywords.
Google gives a few examples of doorway pages:
- “Having multiple websites with slight variations to the URL and home page to maximise their reach for any specific query
- Having multiple domain names or pages targeted at specific regions or cities that funnel users to one page
- Pages generated to funnel visitors into the actual usable or relevant portion of your site(s)
- Substantially similar pages that are closer to search results than a clearly defined, browseable hierarchy”
Internal search pages (web pages automatically created by internal search engines) are a quick way to (sometimes intentionally, other times unintentionally, but the outcome is still the same) create thousands of doorway pages. This is because many internal search engines create a new and unique page every time a searcher looks for something that’s never been searched for before within that engine. That page will then be stored in the website’s database, ready to be surfaced the next time someone searches that same query. So, if “hairdressers london” has never been searched for before, the internal search engine will create the page
and then, if someone searches the plural, “hairdressers London”, the internal search engine will then create the different page:
In this way, popular websites with a large number of internal searches can quickly generate a huge number of doorway pages. To make matters worse, these pages, by nature of being dynamically generated, often have thin content, another of Google’s bugbears!
Whereas some web publishers know that their internal search pages are doorway pages, others use this strategy unknowingly. Other times doorway pages are very much intentional. For instance, you’ll often see trades website’s with a huge list of local landing pages:
Again, these are doorway pages and subject to penalisation from Google.
Hacked content refers to unauthorised and malicious content placed on a website by hackers due to security vulnerabilities. This content not only provides poor search results (as it can mean that the content on a web page is compromised and stuffed full of low quality or junk material) but also poses significant risks to users.
For instance, hackers can find a good website that has a security vulnerability in a WordPress plugin. They may then exploit this vulnerability to inject hidden code and links, compromising the integrity of the website or even allowing them to capture sensitive user data, including credit card data. As a result, users who access the compromised site may unknowingly download malware, have their personal information stolen, or be redirected to phishing sites.
Even though hacked content is rarely intentional by the web publisher (it’s usually hackers that are responsible, not the site owners themselves!) Google will still penalise websites that have been hacked, until the web publisher can show that they’ve cleaned up the hack and removed the malware.
Hidden text and links
Hidden text and links is where a black hat SEO places content on a webpage that’s visible to search engines but invisible to users.
For example, a web publisher might repeat the same keyword one hundred times on a web page, but make the text tiny, or render it as white text on a white background. The search spiders will be able to read the text within the html of the page, but it will be all but invisible to the human eye.
Keyword stuffing is the practice of excessively and unnaturally overloading a web page with keywords in an attempt to manipulate search rankings. For instance, imagine a website dedicated to online gaming that repeatedly lists popular game-related keywords, such as “best gaming headphones,” “top gaming keyboards,” and “cheap gaming laptops,” without providing any substantial content or relevant information. This technique aims to artificially inflate the site’s visibility in search results, even though the content lacks value and readability for users.
Keyword stuffing doesn’t just occur in the main content of a web page, though. Black hat SEOs will also stuff keywords into important on-page elements such as title tags, heading tags, alt attributes and noscripts. Keyword stuffing is also seen when SEOs over optimise anchor text. For instance, they may make sure that every link that points back to their website has the anchor text “Hairdressers London”.
Link spam is any practice that builds links with no user value. For instance:
Guest blogging link spam: If you pay a guest blogger to place an article on their site, but the main point of that article is not to be read by genuine users, but instead to link back to your website.
Link networks: A network of websites that all link to one another to manipulate search rankings of the network as a whole. For instance, website A links to website B which links to website C which links to website D which links to website A.
Link farms: A link farm is a website that exists solely to generate links to other websites. Back in the day, link farms often looked like directories of varying quality, but very few users would ever use them like they would a genuine directory such as Yelp. Instead, they existed to manipulate search rankings to the websites that they linked to. These days, link farms tend to look a little more sophisticated than directories, and many blogs that sell links could be interpreted as link farms.
Comment section links: This is where black hat SEO will go into the comment section of articles published on prominent websites, write a comment in the comment section and contain a link back to their target website within that comment.
Paid links: In essence, these are any backlinks that a web publisher pays for, that the recipient does not mark as “sponsored” or “nofollow” (Google prefers them to be marked “sponsored” over “nofollow”. Google relies on links as a form of quality control. If a web publisher like, say, the New York Times, writes an article about beauty tips, and within that article links to a great hair salon, then Google will see that as a human recommendation from an authoritative source (the New York Times) to that salon. It will reward the salon with boosted rankings.
If, however, the New York Times has been financially incentivised to post that link, then it’s not a genuine quality endorsement and the linked to site or service may well not be as good as the link seems to suggest. If you pay for links, and don’t mark it, this is seen as spam and subject to devaluation or penalisation.
Note: it is, of course, sometimes very hard for Google to distinguish between when a link has been paid for or not, but it is getting better at making educated guesses. As a general rule of thumb, the safest form of link building remains those built via genuine online PR efforts.
Google defines machine generated traffic as:
“Machine-generated traffic consumes resources and interferes with our ability to best serve users. Examples of automated traffic include:
- Sending automated queries to Google
- Scraping results for rank-checking purposes or other types of automated access to Google Search conducted without express permission”
Malware and malicious behaviours
Google actively checks websites for hosting malware or unwanted software that harms users. Malware includes software designed to harm devices, steal sensitive information, or perform malicious activities, such as ransomware, spyware, or viruses. It’s usually (but not always) inserted into a site by hackers who have managed to exploit a vulnerability in your site, for example, in an outdated WordPress plugin, although of course sometimes thieves will intentionally set up a website and fill it with malware to steal users’ information.
Unwanted software negatively affects the browsing experience, often by displaying intrusive ads, redirecting users to unrelated websites, or collecting personal information without consent. Even if you are unaware that your site has been hacked, and are not responsible for the hack, Google can, and will still penalise or devalue you for it.
Misleading functionality involves creating deceptive features or services on a website to manipulate search rankings. For example, if a website claims to offer a free credit score checker, but upon using this tool, users are redirected to unrelated and potentially harmful advertisements. These fake generators or services attract users with their promised functionality but instead lead them to deceptive ads or attempt to gather their sensitive data. Google provides examples:
- “A site with a fake generator that claims to provide app store credit but doesn’t actually provide the credit
- A site that claims to provide certain functionality (for example, PDF merge, countdown timer, online dictionary service), but intentionally leads users to deceptive ads rather than providing the claimed services”
Scraped content refers to websites that primarily rely on content taken from other sources without providing substantial additional value. For instance, a website that aggregates news articles from reputable publishers without proper attribution or permission and that doesn’t add anything extra of value to these articles. Instead of curating or adding unique insights, the site merely republishes the content, claiming it as its own, often resulting in duplicate or low-quality information.
Not only does this often result in low quality content and duplicate content being visible in the SERPs (as Google might accidentally show the same article twice in the same top 10 results of the Google listings, this reducing the diversity of those listings), but it can also cause serious copyright and trademark infringement issues.
Sneaky redirects are when a black hat SEO misleads users or search engines by redirecting them to different or unexpected content without their knowledge or consent.
For example, if there was a website promoting a popular game. But, when you go to a page on that website that claims to host the game, or to provide guides or cheats for the game, you’re instead directed to adult content or an online gambling platform. While legitimate redirects are acceptable for site moves or consolidation, misleading redirects aim to deceive users, leading them to irrelevant or potentially harmful destinations.
Spammy automatically-generated content
Spammy automatically-generated content refers to content generated programmatically without adding value, solely for the purpose of manipulating search rankings.
For instance, a website that generates nonsensical blocks of text filled with keywords, often using automated translation tools to create variations in different languages. These auto-generated pages serve no meaningful purpose for users, contributing to cluttered search results and a poor user experience.
Thin affiliate pages
Thin affiliate pages lack original content and value, relying on copied product descriptions, reviews, or specifications without adding unique information or insights.
For example, if a website acts as an affiliate for an online retailer, but instead of providing comprehensive product reviews, comparisons, or additional resources, it simply copies and pastes the manufacturer’s descriptions and specifications. These thin affiliate pages do not offer users any valuable or distinct information, and often serve as mere conduits to generate affiliate revenue without contributing original insights. These would be considered thin affiliate pages.
User-generated spam occurs when individuals or automated systems add spammy content through platforms intended for user contributions, such as forum posts, comments sections, or file uploads.
For example, if the Guardian newspaper had less robust comments section oversight. Some users might repeatedly post unrelated advertisements, links to malicious websites, or nonsensical messages about purchasing Bitcoin, Viagra or some get rich quick scheme. Site owners must implement measures to prevent such abuse, such as proper comments section oversight (like, FYI, the Guardian has) and maintain a positive user experience.
Thin content refers to web pages or sections of a website that lack substantial or meaningful content. It’s characterised by a minimal amount of text, often containing little to no unique information or value to the user. They are often found in doorway pages.
For instance, going back to that plumber example earlier, if there is a website with 100 different local landing pages for:
and many more, each page might say something pretty useless, like:
“If you live in Kensington, you may need a plumber. If your central heating breaks down and you live in St Edward’s Square, or if your pipes burst and you live on the Gloucester Road, then a Kensington plumber is the thing you need!”
Thin content doesn’t always need to keyword stuff like the above example, but it often does. Also, many websites have similar pages, but with no written content at all! Just, for example, an optimised URL (/plumber-kensington), title tag, (“Kensington Plumber”) and Heading Tag (“Kensington Plumber”). This lack of written content altogether also represents thin content. Even though Google does not list thin content within its spam policies, historically web spam related messages from Google (that can, but don’t always, appear in Search Console to confirm a penalty) have often cited thin content as the reason for penalisation .