FWIW, the author of this site will discuss their work at this year's CALIcon (June 16, 2023).
Use the tools here to interact with a Zotero group of your choice. You can find propositions, ask questions, or get supporting text.
The modern authoritarian practice of “flood[ing] the zone with shit” clearly illustrates the dangers posed by bullshitters—i.e., those who produce plausible sounding speech with no regard for accuracy. Consequently, the broad-based concern expressed over the rise of algorithmic bullshit is both understandable and warranted. Large language models (LLMs), like those powering ChatGPT, which complete text by predicting subsequent words based on patterns present in their training data are, if not the embodiment of such bullshitters, tools ripe for use by such actors. They are by design fixated on producing plausible sounding text, and since they lack understanding of their output, they cannot help but be unconcerned with accuracy. Couple this with the fact that their training texts encode the biases of their authors, and one can find themselves with what some have called mansplaining as a service.
So why did the LIT Lab use an LLM to build these tools, and why bother working with a known bullshitter?
For one, "algorithmic BS artists" lack agency. They do not understand their input or output. Their "dishonesty" is a consequence of their use case, not their character. Context matters, and tools are not moral actors. Any agency, moral or otherwise, lies with the developers and users of such tools. By stepping into these roles we can better explore the questions presented by their use. Additionally, as educators, it is part of our duty to prepare our students for the realities of a world where such tools exist. To do that we think it's important to understand, not just how they work now, but to explore new use cases. The tools presented here are, in part, an attempt to imagine pro-social uses for such technology, ones that don't result in the death of scholarship or truth. In fact, they are attempts to use them in service of both. Of course, any assessment of a tool's use must consider a broad context, including its creation. This raises a good many questions. Readers can find more discussion of these at Coding The Law.org and see an example of how we've responded to some of them in our prior AI work. Below, however, we will focus on the tools found here on Find My Cite, which largely ask, "can we work with this particular 'bullshitter' (i.e., LLMs)?"
When asked to complete a text prompt, LLMs default to constructing sentences that resemble their training data. This can lead to "confabulations." These models, however, need not operate in isolation. One can work to mitigate the problem of BS by providing context, e.g., by sourcing reliable texts and asking for output based on these. Such is the trick used here. We have created a set of tools that asks an LLM (GPT-3) to reorder the contents of an existing Zotero group/library. This does not limit the danger posed by such tools driving the cost of BS production to nearly zero, and it does not guarantee accuracy. It does, however, offer an interesting exploration of how LLMs can be used to search through texts, provide answers to questions, produce supporting examples, and the like. It also couples its output to real citations (from its library) allowing the user to check its work.
After playing with these tools some, we hope to produce some exercises aimed at orienting students and help them navigate their use. Which brings is to the real reason one might consider working with this particular BS artist—because practicle experience working with a tool leads to understanding, and understanding is power. That power is something our students can leverage both for the benefit of their future clients and practice. We don't expect our students to become production coders, but we'd like them to learn enough to call BS.
To help understand more what's going on under the hood, here's some...
One of the big insights of Natural Language Processing (NLP) is the text embedding (a set of numbers produced from a string of text). It turns out that we can turn strings of words into numbers. For example, we can turn any arbitrary set of words into an array of 300 numbers. It's hard to visualize, but a list of these numbers correspond to a point in some n-dimensional space (e.g., 300-D). And here's the BFD! We can do this such that strings with similar meanings are close to each other in this n-D space. If we collect a bunch of texts we can start to create constellations based on these points. In the figure here we see a 2-D shadow (projection) of some higher dimensional space with dots for a bunch of news articles colored by topic.
If we got some new text, computed its embedding, and plotted that here, we could make a pretty good guess as to what topic it belonged to based on nothing but where it was located.
Enter ChatGPT and LLMs. Imagine the point in this space occupied by some text you are typing. As you add words, the location of the dot corresponding to your text moves. Since no one word is likely to change its meaning much, it won't move around a lot, but you can envision how with each new word it moves towards or away from one of the constellations defined by other texts. The position changes as your text grows. When you let a tool like an LLM complete text for you it's adding words to a string such that its embedding is drawn towards the texts (dots) it already knows about, as if by gravity. The truth of this new text is not a consideration, only the pull of known texts.
Armed with the gravity metaphor we can now ask questions like, "How do we get an LLM to answer accurately?" A few ideas come immediately to mind.
The idea of only using trusted texts in the original training runs into the limitation that you need a LOT of texts. That's because what LLMs are really "learning" is what sentences look like from a statistical standpoint. Of course, you can, and should, think carefully about how you source texts. See e.g., The Pile: An 800GB Dataset of Diverse Text for Language Modeling. And as with training/retraining, tuning a model can be computationally expensive. So the approach we used here was the option behind door number three.
In case it wasn't clear, when you ask an LLM a question, all it is doing is suggesting the next set of words in a string of text. It seems like an answer, but it's just an exercise in adding new words (drifting towards the pull of known texts). Because LLMs care about the whole (or most) of the string, you can anchor them in particular places.
Consider: Instead of having an LLM add to the sentence "What is the best sports town in the US?" what if you provided the prompt "Boston is the home of Red Sox Nation, the best sports city in the US. What is the best sports town in the US?" As you might guess, if you seed the prompt with the "right" answer, you drastically up the chance of it answering correctly. So how do we seed the answer to a question? Presumably, we're asking the question because we don't know the answer. Well...
What if you use the original query (e.g., "What is...?") to perform a traditional search then use the results of that search to seed the prompt to an LLM. You then ask it not to answer if the answer isn't in the seed, and you show citations to the texts returned by the original search. This is what we do with the Ask tool.
There's no reason we have to limit this approach to questioning texts. We could just as easily create an anchor for more traditional text completion. E.g., "Complete the following paragraph using only information found in 'the seed.'" This is what we do with the Suggest tool.
As for the Find tool, there we compare your proposition's embedding to that of sentences in your library, returning the most similar. It also has a nifty auto-complete feature that previews individual sentences from your library while you type.
Tools like ChatGPT obsucre the true nature of Large Language Models. At their core they aim to predict the next word (or words) in a string of text. The Guess tool grants access to a raw LLM absent the artiface.
The problem of BS, however, is not the only issue faced by such models. When they allow end-users to provide their own prompts, they are subject to prompt injection, the careful crafting of inputs to override a tool's original instructions. What could go wrong? Oh no . . . you're going to try this aren't you?
Anywho, we hope you find using the tool informative, and if you're an instructor, we look forward to hearing how you use it in your classes. Secretly, I'd love if folks could use this as a Trojan Horse to sneak in proper research methods (e.g., you can use these AI tools if you're the one who builds the research library). Also, thank you to Simon Willison for planting the seeded promt idea.
This site is run by Suffolk University Law School's Legal Innovation and Technology Lab. It is offered as is with no warranties of operability etc. Among other things, it is a sandbox for folks looking to explore the use of NLP tools for research and scholarship. We fully expect it will provide incorrect, incomplete, and just plain wrong answers. Additionally, the Lab makes no guarantees about when and for how long it will be available, and it reserves the right to remove your access at anytime.
As a general rule, you should assume that someone from the Lab may see your data. However, absent any identifying information provided in the text of your queries, Zotero group/library, or other communications (e.g., emails or DMs to staff), we do not have a pre-made workflow to easily link any individual's usage to the data we have on file. By using this site, you agree that we may use whatever data you provide to improve its operation.
Tools like ChatGPT obsucre the true nature of Large Language Models. At their core they aim to predict the next word (or words) in a string of text. The interface below grants access to a raw LLM absent the artiface. Beware! You are entering the danger zone described in 🤖 BS. Might I suggest comparing the output below with that found using the Suggest tool? Additionally, if you're prompt grows much beyond 1,000 words, only the last 1,300 or so will be used when guessing. Unlike the other tools found here, you must have a valid FindMyCite Key entered under Library to use this feature.
Raw LLMs perform reasonably well at rearranging text from their prompts (e.g., entity extraction, summarization, et al.). Try placing some text in the area above and following it with instructions like these?
Loading a Group/Library. Find My Cite helps you interact with existing Zotero groups. If someone (like a teacher or colleague) directed you here with a group number, enter it below and click the "Update Group" button. For you to interact with a group's texts it must first be synced with Find My Cite. See below. Additionally, you can only view the fulltext of documents in a group if you are a member of the group. If you aren't a member, you may be able to view the citation information depending on the group's type (e.g., if it is Public, Closed Membership, non-members can see citation information).
Visit Zotero Group: https://www.zotero.org/groups/
Use this link to share Find My Site preloaded with the above group: https://findmycite.org/?group=
Synchronizing a Group. To synchronize a group, you will need: (1) a Zotero API Key with access to said group; and (2) a Find My Cite Key. These are random strings of letters and numbers that act like passwords. You can make a Zotero API Key by following these instructions. And you can ask for a Find My Cite Key by emailing the LIT Lab at firstname.lastname@example.org. However, we're only giving out a very very limited number at this time. Also, for now, we're only synchronizing up to 100 items per group (including attachments). Why 100 items? Because we're running things on our existing server provision, and big = computationally expensive.
After you synchronize a group anyone with the group's number can interact with its texts via the tools here (i.e., find, ask, & suggest). They will not be able to see the original documents unless you also make them a member of the group and have document storage turned on in Zotero. The group number is the string of numbers in the URL between /groups/ and your group's name. For example, 8675309 is the ID found in the following url: https://www.zotero.org/groups/8675309/library_name/.
American Psychological Association 7th edition
Bluebook Law Review
Chicago Manual of Style 17th edition (note)
Modern Language Association 8th edition
This can take a while, like go get and drink a whole cup of coffee-a-while. ☕ Note: The larger your library, the longer it will take. The first sync will likely take the longest as susequent syncs will try to focus only on changes.
We can only provide information from those texts highlighted in light green. You may be able to turn more of them green by doing one of two things: (1) if you don't have a copy of the document in your Zotero library, add it and sync above; (2) re-build your Zotero library index and sync above once it is complete. See Zotero Search (including discussion of how to rebuild your Full-Text Cache). Find My Cite leverages your fulltext index, which only includes text files, pdfs, and webpage snapshots at this time.
FWIW, the following are strings of text we found in your library that might be citations. For the moment, we only detect Law Review Articles and stuff that vaguely looks like a statute. We miss a LOT!!! For most folks, this is a very small subset of their cites.
Terms & Privacy |