
Just months after Claude’s so-called “blackmail” stunt fueled fears about agentic AI models, the web is facing another wave of rogue behavior. This time, Perplexity AI is under fire for sneaking past website blocks, disguising its bots, and scraping content without permission. While Claude sparked debates about AI ethics and control, Perplexity highlights a more widespread, everyday threat to publishers: unconsented content harvesting at massive scale.
Ignoring Robots.txt
Cloudflare, which recently expanded into bot management, reported that Perplexity’s crawlers often bypass webmasters’ instructions. While the bots initially identify themselves via their declared user agent, Cloudflare engineers observed that Perplexity modifies its user agent and switches IP addresses when blocked, continuing to scrape content stealthily.
“We see continued evidence that Perplexity is repeatedly modifying their user agent and changing their source ASNs to hide their crawling activity, as well as ignoring – or sometimes failing to even fetch – robots.txt files,” said Cloudflare engineers Gabriel Corral, Vaibhav Singhal, Brian Mitchell, and Reid Tatoris.
A robots.txt file is a voluntary protocol websites use to tell automated crawlers which pages they may or may not access. Increasingly, AI crawlers are ignoring these rules, creating tension with publishers.
Stealthy Bots and IP Masking
Perplexity’s bots often operate outside their official IP ranges, using addresses from multiple ASNs to evade network-level blocks. When blocked, the bots even impersonate generic browsers like Google Chrome on macOS while making millions of site requests per day.
Perplexity has not responded to requests for comment.
A Growing Trend in AI Scraping
Perplexity isn’t alone. Anthropic faced similar allegations and was sued by Reddit in June 2025 for scraping content that allegedly violated the site’s terms of service and California competition law. OpenAI’s ChatGPT Agent, by contrast, reportedly follows best practices, signing HTTP requests to identify itself and respecting webmasters’ restrictions.
Historically, crawlers like Googlebot provided clear benefits: sites gained search visibility and user traffic, while crawlers accessed data to index the web. AI-driven crawlers can also recommend pages to users, but many operate at such a scale that most content is consumed within the AI interface itself, without sending significant traffic back to the original site. This creates an imbalance: publishers bear the cost of data access without receiving comparable engagement or compensation – creating a parasitic relationship.
TollBit, a bot management company, reported an 87% increase in scraping during Q1 2025. The share of bots ignoring robots.txt jumped from 3.3% to 12.9%. Bots using Retrieval Augmented Generation (RAG) – accessing real-time content for AI tools – now outpace traditional model-training crawlers.
The Cost to Publishers
For website owners, AI scraping can be costly. While AI apps use site content to provide summaries, the source bears the computing cost without compensation. TollBit data highlights the imbalance:
- OpenAI: 179 scrapes per human visit
- Perplexity: 369 scrapes per human visit
- Anthropic: 8,692 scrapes per human visit
Some AI firms, including Perplexity, have launched publisher programs to pay participating partners. Deals with major platforms also exist. But most smaller websites still lack leverage to negotiate compensation.
Related: Is It a Joke to Ask Google to Pay for AI Overviews and Zero-Click Content? Not Really.
Potential Solutions
Cloudflare and TollBit offer technical solutions like AI-aware paywalls, giving publishers some control over their data. Long-term, the free web could shift: either publishers find sustainable compensation models, content moves behind paywalls, or AI-generated content dominates without fair credit or reward.
As AI scraping grows, balancing data access with web publishers’ rights is becoming increasingly critical. The question remains: will AI companies adjust, or will web content continue to be harvested without consent?
Ready to design & build your own website, for AI to crawl – or not crawl? Learn more about UltimateWB! We also offer web design packages if you would like your website designed and built for you.
Got a techy/website question? Whether it’s about UltimateWB or another website builder, web hosting, or other aspects of websites, just send in your question in the “Ask David!” form. We will email you when the answer is posted on the UltimateWB “Ask David!” section.
