Algorithmic BS: Exploring Uses of Large Language Models
Algorithmic BS?
Use the tools here to interact with a Zotero group of your choice. You can find propositions, ask questions, or get supporting text.
The modern authoritarian practice of “flood[ing] the zone with shit” clearly illustrates the dangers posed by bullshitters—i.e., those who produce plausible sounding speech with no regard for accuracy. Consequently, the broad-based concern expressed over the rise of algorithmic bullshit is both understandable and warranted. Large language models (LLMs), like those powering ChatGPT, which complete text by predicting subsequent words based on patterns present in their training data are, if not the embodiment of such bullshitters, tools ripe for use by such actors. They are by design fixated on producing plausible sounding text, and since they lack understanding of their output, they cannot help but be unconcerned with accuracy. Couple this with the fact that their training texts encode the biases of their authors, and one can find themselves with what some have called mansplaining as a service.
So why did the LIT Lab use an LLM to build these tools, and why bother working with a known bullshitter?
For one, "algorithmic BS artists" lack agency. They do not understand their input or output. Their "dishonesty" is a consequence of their use case, not their character. Context matters, and tools are not moral actors. Any agency, moral or otherwise, lies with the developers and users of such tools. By stepping into these roles we can better explore the questions presented by their use. Additionally, as educators, it is part of our duty to prepare our students for the realities of a world where such tools exist. To do that we think it's important to understand, not just how they work now, but to explore new use cases. The tools presented here are, in part, an attempt to imagine pro-social uses for such technology, ones that don't result in the death of scholarship or truth. In fact, they are attempts to use them in service of both. Of course, any assessment of a tool's use must consider a broad context, including its creation. This raises a good many questions. Readers can find more discussion of these at Coding The Law.org and see an example of how we've responded to some of them in our prior AI work. Below, however, we will focus on the tools found here on Find My Cite, which largely ask, "can we work with this particular 'bullshitter' (i.e., LLMs)?"
When asked to complete a text prompt, LLMs default to constructing sentences that resemble their training data. This can lead to "confabulations." These models, however, need not operate in isolation. One can work to mitigate the problem of BS by providing context, e.g., by sourcing reliable texts and asking for output based on these. Such is the trick used here. We have created a set of tools that asks an LLM (GPT-3) to reorder the contents of an existing Zotero group/library. This does not limit the danger posed by such tools driving the cost of BS production to nearly zero, and it does not guarantee accuracy. It does, however, offer an interesting exploration of how LLMs can be used to search through texts, provide answers to questions, produce supporting examples, and the like. It also couples its output to real citations (from its library) allowing the user to check its work.
After playing with these tools some, we hope to produce some exercises aimed at orienting students and help them navigate their use. Which brings is to the real reason one might consider working with this particular BS artist—because practicle experience working with a tool leads to understanding, and understanding is power. That power is something our students can leverage both for the benefit of their future clients and practice. We don't expect our students to become production coders, but we'd like them to learn enough to call BS.
To help understand more what's going on under the hood, here's some...
Background
One of the big insights of Natural Language Processing (NLP) is the text embedding (a set of numbers produced from a string of text). It turns out that we can turn strings of words into numbers. For example, we can turn any arbitrary set of words into an array of 300 numbers. It's hard to visualize, but a list of these numbers correspond to a point in some n-dimensional space (e.g., 300-D). And here's the BFD! We can do this such that strings with similar meanings are close to each other in this n-D space. If we collect a bunch of texts we can start to create constellations based on these points. In the figure here we see a 2-D shadow (projection) of some higher dimensional space with dots for a bunch of news articles colored by topic.
If we got some new text, computed its embedding, and plotted that here, we could make a pretty good guess as to what topic it belonged to based on nothing but where it was located.
Enter ChatGPT and LLMs. Imagine the point in this space occupied by some text you are typing. As you add words, the location of the dot corresponding to your text moves. Since no one word is likely to change its meaning much, it won't move around a lot, but you can envision how with each new word it moves towards or away from one of the constellations defined by other texts. The position changes as your text grows. When you let a tool like an LLM complete text for you it's adding words to a string such that its embedding is drawn towards the texts (dots) it already knows about, as if by gravity. The truth of this new text is not a consideration, only the pull of known texts.
Armed with the gravity metaphor we can now ask questions like, "How do we get an LLM to answer accurately?" A few ideas come immediately to mind.
- Only fill the space with texts (dots) that are true (i.e., retrain the model).
- Add in/identify some texts that we know are true and have them exert a greater pull (i.e., tune the model)
- Craft the start of your prompt such that it anchors and/or directs the dot's subsequent motion towards parts of space associated with trusted texts.
The idea of only using trusted texts in the original training runs into the limitation that you need a LOT of texts. That's because what LLMs are really "learning" is what sentences look like from a statistical standpoint. Of course, you can, and should, think carefully about how you source texts. See e.g., The Pile: An 800GB Dataset of Diverse Text for Language Modeling. And as with training/retraining, tuning a model can be computationally expensive. So the approach we used here was the option behind door number three.
The Tools Found Here
In case it wasn't clear, when you ask an LLM a question, all it is doing is suggesting the next set of words in a string of text. It seems like an answer, but it's just an exercise in adding new words (drifting towards the pull of known texts). Because LLMs care about the whole (or most) of the string, you can anchor them in particular places.
Consider: Instead of having an LLM add to the sentence "What is the best sports town in the US?" what if you provided the prompt "Boston is the home of Red Sox Nation, the best sports city in the US. What is the best sports town in the US?" As you might guess, if you seed the prompt with the "right" answer, you drastically up the chance of it answering correctly. So how do we seed the answer to a question? Presumably, we're asking the question because we don't know the answer. Well...
Ask a question of your library
What if you use the original query (e.g., "What is...?") to perform a traditional search then use the results of that search to seed the prompt to an LLM. You then ask it not to answer if the answer isn't in the seed, and you show citations to the texts returned by the original search. This is what we do with the Ask tool.
Suggest text supported by your library
There's no reason we have to limit this approach to questioning texts. We could just as easily create an anchor for more traditional text completion. E.g., "Complete the following paragraph using only information found in 'the seed.'" This is what we do with the Suggest tool.
Find a proposition in your library
As for the Find tool, there we compare your proposition's embedding to that of sentences in your library, returning the most similar. It also has a nifty auto-complete feature that previews individual sentences from your library while you type.
Guess what word(s) should come next
Tools like ChatGPT obsucre the true nature of Large Language Models. At their core they aim to predict the next word (or words) in a string of text. The Guess tool grants access to a raw LLM absent the artiface.
The problem of BS, however, is not the only issue faced by such models. When they allow end-users to provide their own prompts, they are subject to prompt injection, the careful crafting of inputs to override a tool's original instructions. What could go wrong? Oh no . . . you're going to try this aren't you?
Anywho, we hope you find using the tool informative, and if you're an instructor, we look forward to hearing how you use it in your classes. Secretly, I'd love if folks could use this as a Trojan Horse to sneak in proper research methods (e.g., you can use these AI tools if you're the one who builds the research library). Also, thank you to Simon Willison for planting the seeded promt idea.