osteophage | How LLMs & Chatbots Are Bad For the Indie Web (Reply)

Large language models and their associated bots are bad for the indie web in at least three ways: 1) their logistical consequences are bad for bandwidth, 2) their social consequences are bad for guides, and 3) their citational consequences are bad for surfability. These consequences are worth highlighting in light of how LLM-based chatbots have been used and endorsed on the indie web. The indie web may mean different things to different people, but if we’re thinking of it at all in terms of favoring small sites over corporate exploitation, then the indie web as a concept and a practice is fundamentally at odds with what LLMs are doing to the web.

Crossposted to Pillowfort and my personal site hosted on Neocities. For off-site linking, I recommend using the version on Neocities.

Part of the inspiration for this post comes from a thread at the 32-Bit Cafe, but what has sustained the motivation is my repeated encounters with how LLMs have been put forward. Chatbots keep being suggested as a form of coding assistance in pieces like Welcome to The Web We Lost, The Internet’s Hidden Creative Renaissance, and a certain website about HTML. Recently the company that makes Firefox has announced that it intends to join corporate bandwagon by implementing all new security hazards. By chance I found out that an indie web directory site has implemented bot-generated summaries. Then I found an upcoming indie web project and saw that it has accepted a LLM feature request from someone referring to LLMs and their ilk as “a basic need.”

Running into stuff like this, repeatedly, has motivated me to put together this post.

Note that in order to distinguish itself, this post will try to avoid the more heavily-trod ground in LLM criticism. That means no descriptions of the environmental impact, no warnings about the looming economic consequences of the investment bubble, and no artistic, aesthetic, or spiritual appeals about the loss of “soul” or “humanity.” As salient as those points may be, I expect you’ve already heard them before, and none of them are necessary to make the case that LLMs are bad for the indie web.

1) Bad For Bandwidth

LLMs are fed data from scraper bots that are notorious for overloading bandwidth, which means disrupting legitimate traffic from actual visitors and potentially driving up the cost of hosting. In extreme cases, they may even knock websites offline. Declaring your policies in a robots.txt file is not sufficient to stop them.

At this point there have been countless posts about this, so for those new to this issue, here’s a selection for you:

Perplexity Is Lying about Their User Agent by Robb Knight
Fighting the [LLM] scraperbot scourge by Jonathan Corbet
Please stop externalizing your costs directly into my face by Drew DeVault
The Great Scrape by Herman of Bearblog
Denial by Jeremy Keith
Fuck LLM scrapers and Blocking abusive webcrawlers by Fluffy
Are [LLM] Bots Knocking Digital Collections Offline? An Interview with Michael Weinberg by Lisa Janicke Hinchliffe

On multiple occasions this problem has also impacted the IndieWeb wiki, which now has a dedicated page about LLM traffic.

Make no mistake, there is a distinct asymmetry at play here. Megacorporations can hammer the servers of smaller companies, hobbyist projects, public research efforts, and indie personal sites, but turnabout is not fair play. The disparity should be immediately noticeable to anyone acquainted with spurious DCMA takedowns or how Nintendo has responded to unauthorized emulators. Major IP holders get to be very fussy about policing whatever they claim as their turf, and yet now these megacorporations are being granted social license to run roughshod all over us, overloading bandwidth and chewing up the public web, regardless of permission or consent. They don’t care about consent. Consent is for paupers.

To be clear, this point is not an invitation to litigate the complexities of copyright law. This is a point about inequity of interference. Even if a given website is entirely in the public domain, it still wouldn’t be right for a megacorporation to scrape the thing so hard as to knock it offline. If indiscriminate scraping is a necessary condition of the industry, as the suits have claimed it is, then that means the fundamental logistics of the industry are bad for the logistics of the indie web.

2) Bad For Guides

Reliance on chatbots is bad for guides, by which I mean they undermine the living, breathing people who provide others with guidance. For many such people, developing the right frame of reference and maintaining motivation can be contingent upon connecting with and understanding their audience of learners. If those learners become more disconnected and elusive, then our guides will be the worse off for it.

Providing good guidance is not just about being knowledgeable, but about familiarizing yourself with the gap between what you know and what the learner knows, in order to identify a path between the two. Without a strong grasp of learner perspectives, a guide can end up creating a tutorial that falls short — the kind that says “it’s very simple” about something that is not simple or “it’s easy” about something that is not easy. This is the problem that Annie was parodying in How I, a non-developer, read the tutorial you, a developer, wrote for me.

See also the classic “draw the rest of the owl”:

A diagram that purports to tell you how to draw an owl. The first step is to draw two circles. Between steps one and two there is an absurd jump in complexity and detail, with the last step instructing you to simply “draw the rest of the f-ing owl.”

To mitigate this problem, what you need is plenty of exposure to beginner perspectives, and beginner perspectives are what every community stands to lose out on when people are encouraged to turn to chatbots instead. Chatbots end up absorbing people’s questions, obscuring them from living guides. In fact, avoiding interactions with real people can even be a part of the bots’ appeal, in that it means getting to dodge unpleasant social interactions with those who interact poorly with beginners.

When learners overall turn elsewhere, that loss can be de-motivating to people who want to help. Plenty have spoken about how the expectation of chatbot use has undermined the sense of purpose behind writing reference materials. Take for instance the perspective of the culinary guides who are being discouraged from continuing to share their expertise:

When searching on Google for Chinese cooking traditions, a casual cook may be satisfied by the [Bot-Generated] Overview. But that may draw from The Woks of Life blog, a comprehensive English-language resource for Chinese cooking, according to Sarah Leung, one of its co-creators. Her family has spent years building out reference material on techniques, traditions and culture, she said. “[Bot] summaries have almost completely overtaken results about various Chinese ingredients, many of which had no information online in English before individual creators like us wrote about them.”
The shift has her questioning whether it’s worth publishing new reference guides at all. “In all likelihood, no one will ever discover those pages,” she said.

Believing that no one will ever discover your articles, tutorials, walkthroughs, or reference materials can make the whole effort feel pointless, and under these conditions, people are more inclined to withdraw.

3) Bad for Surfability

Turning to chatbots for answers can result in a web that’s increasingly disconnected and worse to browse. Good browsing comes from an abundance of link trails, and link trails are exactly what people are being cut off from discovering or creating when they rely on machine-generated summaries instead. This is especially detrimental for the part of the web that relies on links for surfability.

Surfability for the indie web can only come from a culture of links that allows you to click around. Reading one response post leads you to another. Opening a personal site leads you to a blogroll or a button wall. Finding a directory lets you discover a whole array of websites to explore. If exploring the indie web is what we want, as opposed to loading one single page as a novelty and then getting sucked back into a billionaire’s feed, then the indie web needs this handcrafted surfability.

Surfability is exactly what we stand to lose to LLMs because LLMs are notorious for separating people from sources. The LLM-based chatbots tested in a study by the Tow Center mistook the source of a quote more than half of the time, and that’s when they were directly prompted to find it. In practice, what’s more likely is that synthetic text won’t direct people to sources at all. Bot-generated “overviews” are reducing the click rate on search results, raising concerns about the prospect of less linking in our future. At scale, that would mean fewer trails and pathways to follow between different sites, replaced by more and more dead ends.

That looming possibility leads me to think of this segment from a video about plagiarism online:

Stephen Spinks’ column is extremely moving to read and genuinely important... and no one watching [the plagiarist’s] video had the chance to learn his name. [The plagiarist] made a lot of money repeatedly re-uploading a video about the erasure of queer people — and he did it by erasing queer people. [...]
Good writing about queer living is hard to find and easy to lose, and in obscurity, it becomes even easier to pretend it was yours. None of the money [the plagiarist] makes will go to the people who wrote the great lines his viewers enjoyed. They get to rot in the very obscurity he pretends to criticize.

—Harry Brewis, Plagiarism and You(Tube), “The Cost”

Compounding obscurity is one of the risks we face from an increasing reliance on chatbots. When people don’t get told where things come from in the first place, they miss out on the chance to cite them, which means missing out on the chance to link them, which results in pages with fewer links, which means fewer pathways available to surf the web — a web that becomes less of a web, increasingly threadbare, disconnected, and frayed.

Handcrafted Overview

LLMs and scraper bots are detrimental the indie web in many ways. They are bad for bandwidth, bad for guides, and bad for surfability. This isn’t an exhaustive list of all their harms, just some of the ones most salient to the creation, maintenance, and exploration of personal websites. To the extent that the indie web aligns itself with collaborative values, small personal sites, and a DIY ethos of curiosity and exploration, it is conceptually at odds with extractive corporate technologies that sap our resources, obfuscate our guides, undermine link culture, and discourage us from sharing.