The Sentiment Economy: What 1.17 Million URLs Reveal About Positive vs Negative Web Content

The internet feels negative. Doom-scrolling is a real phenomenon — a Stanford HAI study found that news sources publish nearly twice as much negative as positive content, and a 2023 Nature Human Behaviour study proved that each negative word in a headline increases click-through rates by 2.3%. The algorithm rewards outrage. The perception is that the web is a hostile place.

The data says otherwise. We classified 1,172,644 URLs by content sentiment — Good, Neutral, or Bad — as part of LLMSE's multi-dimensional website analysis. The headline finding: 83.5% of the web is positive. Just 0.32% is negative. For every URL classified as Bad, there are 260 classified as Good.

That gap between perception and reality has consequences. Advertisers waste $26.8 billion annually on programmatic inefficiencies, partly because imprecise sentiment assessment leads to over-blocking of brand-safe inventory. The GARM Brand Suitability Framework — the industry's shared language for content risk — was dissolved in August 2024 after X/Twitter filed an antitrust lawsuit, leaving no replacement standard. And 78% of brands cite brand safety as a top concern despite actual brand risk being at a record low of 1.5%.

This report maps the sentiment landscape of the web: where positivity concentrates, where the 0.32% hides, how sentiment correlates with quality, and what it all means for the $4.82 billion ad verification industry.

The Data

We analyzed 1,172,644 URLs with sentiment classifications in LLMSE's database as of February 26, 2026.

Sentiment URLs Share
Good 979,325 83.5%
Neutral 189,557 16.2%
Bad 3,762 0.32%

Three observations stand out:

  1. The web is overwhelmingly positive. 83.5% of classified content has positive sentiment — upbeat, constructive, or affirmatively toned. This aligns with Meta's transparency reports showing less than 1% of Facebook and Instagram content is removed for policy violations.
  2. Neutral content is the real second tier. 16.2% of URLs are neutral — informational, factual, or editorial content without strong sentiment signals. This is where reference material, documentation, and straight reporting live.
  3. Negative content is vanishingly rare. At 0.32%, Bad sentiment content is a statistical rounding error. The 3,762 negative URLs are outnumbered 260-to-1 by positive content.

Where Positivity Lives: Sentiment by Category

The 83.5% average masks variation across content categories. Some industries are almost uniformly positive; others harbor disproportionate negativity.

The Most Positive Categories

Category Matched URLs Good Good% Bad Bad%
Shopping 20,435 18,666 91.3% 17 0.1%
Style and Fashion 5,699 5,100 89.5% 11 0.2%
Entertainment 151,136 133,969 88.6% 187 0.1%
Food and Drink 20,184 17,660 87.5% 23 0.1%
Travel 13,317 11,638 87.4% 35 0.3%
Business and Industry 252,417 219,860 87.1% 328 0.1%
Internet and Telecom 32,707 28,503 87.1% 411 1.3%

Shopping leads at 91.3% positive — and with a staggering 1,098:1 Good-to-Bad ratio. For every negative shopping URL, there are over a thousand positive ones. This makes intuitive sense: e-commerce is fundamentally optimistic. Product pages, reviews, and promotional content are designed to persuade, not depress.

Food and Drink (87.5%), Style and Fashion (89.5%), and Entertainment (88.6%) follow the same pattern — consumer-facing categories where the commercial incentive is to present content positively.

The Most Negative Categories

Category Matched URLs Good% Bad Bad% Good:Bad Ratio
Adult 21,535 63.6% 680 3.2% 20:1
Disasters 1,749 65.1% 39 2.2% 29:1
Crime 2,726 69.7% 54 2.0% 35:1
Sensitive Topics 7,407 81.4% 108 1.5% 56:1
Internet and Telecom 32,707 87.1% 411 1.3% 69:1
News and Media 28,270 82.5% 211 0.7% 111:1

Adult content leads negative sentiment at 3.2% — ten times the web average. This is the single largest source of Bad URLs, contributing 680 of the 3,762 total (18.1%). Yet even Adult content is predominantly positive (63.6%) — the category is negative relative to the web, not in absolute terms.

Disasters (2.2%) and Crime (2.0%) follow predictably — these categories cover inherently negative topics. But even here, the Good-to-Bad ratios are 29:1 and 35:1 respectively. Crime prevention sites, disaster relief organizations, and educational content vastly outnumber sensationalist coverage.

Internet and Telecom's 1.3% stands out. This is the fourth-most-negative category by percentage, driven by Telecommunications (116 Bad URLs), File Sharing (52), and Web Hosting (52). Complaint-oriented content — service outage reports, hosting reviews, and telecom consumer grievances — concentrates negativity here.

Where the 3,762 Bad URLs Hide

The 0.32% of negative content isn't randomly distributed. It clusters in specific subcategories:

Subcategory Bad URLs Share of All Bad
Adult > Photos 178 4.7%
Adult > Interracial 139 3.7%
Internet and Telecom > Telecommunications 116 3.1%
Business and Industry > Business Services 114 3.0%
Adult > Videos 111 2.9%
Sensitive Topics > Spam or Harmful Content 81 2.2%
Agriculture > Crop Farming 61 1.6%
Entertainment > Movies 57 1.5%
Internet and Telecom > File Sharing 52 1.4%
Internet and Telecom > Web Hosting 52 1.4%
Computer and Electronics > Programming 48 1.3%

Adult content accounts for 18.1% of all Bad URLs across its subcategories (680 total), making it the single largest contributor. But the remaining 81.9% of negative content is spread across 28 other categories — no single non-Adult category exceeds 11%.

Telecom complaints (3.1%) and business services (3.0%) represent the consumer frustration layer of the web — sites where people document negative experiences with service providers.

The presence of Agriculture > Crop Farming (1.6%) is unexpected. Manual inspection reveals these are sites discussing crop failures, pest infestations, and agricultural market downturns — genuinely negative topics in a category that rarely appears in brand safety discussions.

Sentiment and Quality: Do Positive Sites Score Better?

We cross-referenced sentiment with four quality dimensions. The relationship between sentiment and quality is not straightforward.

EEAT (Expertise, Experience, Authoritativeness, Trustworthiness)

Sentiment A B C D F Total
Good 3.0% 22.8% 23.7% 44.4% 6.1% 531,815
Neutral 7.6% 11.4% 19.5% 60.6% 0.9% 90,154
Bad 2.5% 19.0% 50.7% 26.0% 1.8% 2,852

Bad-sentiment sites cluster in C-grade EEAT (50.7%) — more than double the Good-sentiment rate (23.7%). This suggests that negative content tends to come from sites with moderate but not terrible expertise signals. Sites bad enough to score D or F on EEAT (thin content, no author credentials) often don't have enough substance to register strong negative sentiment either.

Neutral content has the highest A-grade rate (7.6%) — 2.5x higher than Good or Bad. This makes sense: the most authoritative sites (universities, government agencies, reference databases) tend to present information neutrally rather than with positive or negative framing.

GARM Brand Safety

Sentiment A (Safe) B C D F (Floor) Total
Good 94.9% 2.8% 1.1% 0.0% 1.3% 62,193
Neutral 88.8% 5.2% 0.0% 1.7% 4.3% 8,939
Bad 77.2% 0.0% 4.5% 0.0% 18.3% 531

Sentiment and brand safety are strongly correlated — but not perfectly. 77.2% of Bad-sentiment URLs are still brand-safe (GARM A-grade). A URL can have negative sentiment (a complaint about a product, a critical review, a report on a scandal) and still fall outside all 11 GARM risk categories.

The critical insight: 18.3% of Bad-sentiment URLs hit the GARM floor (F-grade, universally unsafe for advertising), compared to just 1.3% of Good-sentiment and 4.3% of Neutral-sentiment URLs. Negative sentiment is 14x more likely to be GARM floor content than positive sentiment.

SEO Quality

Sentiment A+B+C (Passing) D+F (Failing) Total
Good 1.9% 98.1% 687,896
Neutral 2.6% 97.4% 124,797
Bad 2.6% 97.4% 3,092

SEO quality is essentially independent of sentiment — all three groups have 97-98% failure rates. This is consistent with the broader finding from our State of Website SEO 2026 report: the vast majority of websites fail basic SEO regardless of their content quality or sentiment.

WCAG Accessibility

Sentiment A (Pass) F (Fail) Total
Good 17.7% 27.0% 68,422
Neutral 14.1% 33.4% 8,928
Bad 18.5% 41.0% 524

Negative-sentiment sites have the worst accessibility, with 41.0% scoring F on WCAG checks — compared to 27.0% for positive content. Sites producing negative content appear less likely to invest in accessibility infrastructure.

The Language Factor: Where Negativity Speaks

Sentiment varies meaningfully by content language:

Language URLs Good% Bad%
Czech 5,535 83.9% 4.4%
Chinese 36,881 75.5% 1.5%
Japanese 23,451 85.7% 1.2%
Turkish 7,411 75.3% 0.9%
Korean 6,191 81.7% 0.8%
Polish 8,926 83.5% 0.6%
Thai 2,238 90.5% 0.6%
French 33,453 85.2% 0.4%
Spanish 27,195 85.9% 0.4%
German 50,549 88.4% 0.2%
English 806,624 84.3% 0.1%
Indonesian 8,311 85.4% 0.1%

Czech leads negativity at 4.4% — 14x the English rate. The Czech web's Bad-sentiment concentration is driven by specific content patterns in the Czech online ecosystem.

Chinese (1.5%) and Japanese (1.2%) have notably higher negativity rates than European languages. This may reflect cultural differences in how sentiment is expressed online, or differences in the types of content that dominate each language's web presence.

English has the lowest negativity rate (0.1%) among major languages — reflecting the sheer volume of commercial, corporate, and institutional content in English that dilutes any negative signal. With 806,624 URLs, the English web's scale means that even rare negative content amounts to 898 URLs in absolute terms.

Thai is the most positive language at 90.5% Good sentiment — consistent with cultural communication norms that favor positive framing.

The Gender Non-Effect

Unlike most other dimensions, sentiment barely varies by gender targeting:

Target Gender URLs Good% Neutral% Bad%
Male 650,777 81.9% 17.8% 0.3%
Female 310,760 86.7% 13.1% 0.3%
All 198,520 84.1% 15.6% 0.3%

All three gender segments have identical Bad-sentiment rates (0.3%). Female-targeted content is slightly more positive (86.7% vs 81.9%) — a 4.8 percentage point gap driven by the higher proportion of lifestyle, beauty, health, and shopping content in female-targeted categories. But negative content is equally rare across all audience segments.

The Infrastructure View

Web server choice shows minor sentiment variation:

Server URLs Good% Bad%
Cloudflare 319,038 83.1% 0.3%
nginx 258,880 79.3% 0.5%
Apache 181,606 84.8% 0.4%
LiteSpeed 46,873 83.3% 0.3%

nginx has the highest Bad-sentiment rate (0.5%) — modest in absolute terms but 67% higher than Cloudflare. nginx's open-source, self-hosted nature means it serves a wider range of content types, including sites that larger CDN providers might decline to serve. Apache's slightly higher positivity (84.8%) reflects its concentration in established business and institutional sites.

What This Means for Advertisers

1. The web is safer than the narrative suggests

At 83.5% positive and 0.32% negative, the data contradicts the doom-scrolling narrative. The Stanford HAI research explains the disconnect: negative content is shared 1.91x more often and each negative headline word increases clicks by 2.3%. The algorithm amplifies negativity, but the underlying content is overwhelmingly positive. Advertisers over-blocking based on perceived risk are excluding brand-safe inventory at scale.

2. Sentiment is not brand safety

77.2% of Bad-sentiment URLs are still GARM brand-safe. A negative product review, a critical news article, or a complaint about a telecom provider all register as negative sentiment but carry zero brand risk. Treating sentiment and safety as equivalent leads to massive over-blocking. DoubleVerify's 2025 data shows the actual brand suitability violation rate is 5.2% globally — and declining.

3. Negative content concentrates predictably

Adult (18.1% of all Bad URLs), Internet and Telecom (10.9%), and Business Services (8.7%) account for 37.7% of all negative content. For advertisers, this means category-level filters remain the most efficient tool — sentiment filtering alone would miss the 77.2% of negative content that's brand-safe, while catching positive content in risky categories would require a different signal entirely.

4. The GARM vacuum creates opportunity

With the GARM framework dissolved, advertisers lack a shared standard for content suitability. Forrester noted that the dissolution "laid bare the fragility of self-regulatory approaches." The $4.82 billion ad verification market (projected to reach $15.87 billion by 2033) needs data-driven frameworks that separate perception from reality. Tools that can distinguish a negative restaurant review (brand-safe) from hate speech (brand-unsafe) at scale are where the market is heading.

5. Language matters for international campaigns

Czech content is 44x more likely to be negative than English content. Chinese and Japanese are 15x and 12x more likely respectively. International advertisers need language-specific sentiment thresholds rather than global blocklists. A 0.3% negative rate in English is not the same risk profile as 4.4% in Czech.

Key Findings

  1. 83.5% of 1.17 million URLs are positive, 0.32% are negative. The web's sentiment distribution is 260:1 in favor of positive content.

  2. Shopping is the most positive category (91.3%) with a 1,098:1 Good-to-Bad ratio. Adult is the most negative (3.2%) but still majority positive (63.6%).

  3. 77.2% of Bad-sentiment URLs are brand-safe. Sentiment and brand safety are correlated but not equivalent — conflating them leads to massive over-blocking of safe inventory.

  4. Negative sentiment correlates with lower quality. Bad-sentiment sites are 14x more likely to hit the GARM floor, 50.7% cluster in C-grade EEAT (vs 23.7% for Good), and 41.0% fail WCAG accessibility (vs 27.0% for Good).

  5. Language is the strongest predictor of negative sentiment. Czech (4.4% Bad) runs 44x higher than English (0.1%). Chinese, Japanese, Turkish, and Korean all exceed 0.8%.

  6. Gender targeting has no effect on sentiment. Male, female, and all-audience content have identical 0.3% negativity rates.

  7. The 3,762 Bad URLs cluster in Adult (18.1%), Telecom (10.9%), and Business Services (8.7%). Three subcategories account for over a third of all negative content on the web.

Methodology

This analysis covers 1,172,644 URLs with sentiment classifications in the LLMSE database as of February 26, 2026. Sentiment (Good, Neutral, Bad) is assigned during the LLM-based classification process alongside category, subcategory, demographics, and language.

Cross-references were computed using Redis sorted set intersections between the sentiment-{Good|Neutral|Bad} indices and category, quality grade (seo-{A-F}, eeat-{A-F}, wcag-{A-F}, garm-{A-F}, readability-{A-F}), gender (sex-{male|female|all}), language (lang-{Language}), and server (server-{Server}) indices. All intersections represent domains present in both the sentiment index and the cross-referenced dimension.

Limitations: (1) Sentiment is classified by the LLM as part of the multi-dimensional analysis — it reflects the model's assessment of overall content tone, not human-annotated ground truth. (2) The 1.17M URL dataset is biased toward the commercial web (sites submitted for classification or discovered through crawling); it is not a random sample of all internet content. (3) "Bad" sentiment captures content with negative tone or harmful framing but does not distinguish between types of negativity (e.g., legitimate criticism vs. malicious content). (4) Category-level analysis counts each URL equally regardless of traffic or influence.

External statistics are sourced from ANA, DoubleVerify, Stanford HAI, Nature Human Behaviour, Meta transparency reports, Forrester, and other cited publications. These provide industry context but were not generated from LLMSE data.

Explore the Data

Browse sentiment-filtered results on LLMSE — search for s:Good, s:Neutral, or s:Bad using the advanced search. Cross-reference with categories, quality grades, and demographics using the filter system. The REST API provides programmatic access to all classification data including sentiment. Check any URL's sentiment with the comprehensive audit.


This analysis was conducted using LLMSE, which has classified over 1.4 million websites across SEO, EEAT, WCAG accessibility, readability, and GARM brand safety dimensions. All data reflects the database as of February 2026. To analyze your own site, visit llmse.ai/classify.