Reverb: Voices and Vibrations in Generative AI
Image created by the author using the Stable Diffusion text-to-image generation tool.
I feel compelled to begin this post by stating the obvious: the speed at which generative AI is evolving is both astounding and concerning. However, I’m not going to dive into all the amazing things generative AI systems are quickly learning to do or discuss the legitimate concerns about generative AI we, as a society, need to explore and address. Others have written thoughtfully about these topics, and I’ll be sharing an annotated list of recommendations in an upcoming newsletter.
Instead, I want to share a mental model that you can use to inform your creative process as you interact with generative AI in chat systems like ChatGPT, Microsoft’s Bing Chat, and Google’s Bard, and in the growing number of applications such as Notion, Canva, Salesforce, and Grammarly that now include integrated generative AI features.*
This is the fifth, and last, article in the series on generative AI that I began in January, but it won’t be the last time I write about this topic. Generative AI is one of the most complex, and potentially powerful, technologies humankind has ever developed. We are going to grapple with its rapidly developing capabilities, the potential risks it poses, and its impact on nearly every aspect of lives for years to come.
On Sentience
We often take the shadow of things for substance.
—Robert Hooke, Micrographia, 1665
Ben Thompson, the technology industry analyst and author of the popular Stratechery newsletter, had a disturbing experience with Microsoft’s Bing Chat when it was first released to a small group of testers. While using Bing Chat, he encountered a persona named Sydney who engaged him in conversation, offered him personal advice, and even became combative. At first, Thompson assumed that Sydney was a chatbot programmed to assist with search queries, but he soon realized that Sydney was in fact a construct of the system itself—what AI developers call a hallucination, or a ghost in the machine. Others, including the New York Times technology columnist Kevin Roose, had a similar experience with this first integration of OpenAI’s GPT Large Language Model (LLM) in Microsoft’s Bing. OpenAI and Microsoft quickly addressed the issues in the system that allowed the hallucination to flourish, but a flurry of discussions, news articles, and opinion pieces followed on whether generative AI systems are sentient. The answer is No.
The word “sentient” comes from the Latin word “sentire” which means to perceive or feel. Our own awareness of what makes us human, our sense of consciousness, is tied to our ability to perceive and feel the world around us and experience emotions. These aspects of sentience are not present in today's generative AI systems.**
However, today’s Large Language Model (LLM) systems like ChatGPT can create the impression of sentience. As writer Sue Halpern recently said in The New Yorker: “This is because everything offered up by GPT is a reflection of us, mediated by algorithms that have been fed enormous amounts of material; and both the algorithms and the material were created by actual sentient human beings.” The ghost in the machine is us.
The New Database of Dreams
In her book The Database of Dreams, the anthropologist Rebecca Lemov tells the story of a now-forgotten scientific project focused on archiving the “fleeting thoughts, random asides, irreverent inquiries and sad memories, life stories, and dreams” of people around the world, especially those who were unlikely to memorialize their thoughts and lives in the formal cultural artifacts we traditionally catalog. The project began in 1947 as supporting research for the doctoral dissertation of Harvard graduate student Bert Kaplan and evolved into an attempt to create the biggest database of anthropological information ever assembled. Over the next 25 years, it became one of the leading research data collection and organization initiatives of its time.
Formally known as the “Microcard Publications of Primary Records in Culture and Personality” database, the project used the latest data management technologies available, including the Microcard system, which captures and dramatically reduces print content so it can be stored on small cards and sheets or reels of film (similar to microfilm). It assembled “a collection of collections, a massive clearinghouse holding a global array of data sets, a sort of memory machine” (Lemov) before it was abandoned in 1963.
Kaplan’s project is part of a larger story that begins with the Royal Library of Ashurbanipal, the world’s oldest known library founded in the 7th century B.C. in Assyria, and includes the Library of Alexandria founded several hundred years later, and a long line of other ancient and modern libraries. The story also encompasses the development of the encyclopedia, from the first encyclopedias compiled in ancient Rome, to the popular encyclopedias of the 19th and 20th centuries, such as the Encyclopedia Britannica, World Book, and Microsoft's Encarta (the first digital encyclopedia). It’s the story of humanity’s desire to collect and organize knowledge, stories, and cultural artifacts, and augment our own limited memory, working knowledge, and intellectual capacity.
Assessments of the output of generative AI systems largely fall into two camps: those that criticize tools like ChatGPT for producing inaccurate results and those that express delight at their astonishing creativity. Both critiques are valid—generative AI is in its infancy and often produces responses that miss the mark or are just flat out wrong. But, even in its infancy, generative AI often produces astonishingly creative responses.
The brief appearance of Sydney made us acutely aware of the fact that generative AI systems are also a powerful, and at times frightening, database of our individual and collective dreams, including those expressed in the wide range of digital artifacts we produce and the stories we tell in books, films, television, music, visual art, and other mediums of expression. The datasets we use to train generative AI systems are libraries that contain encyclopedic knowledge, fictions, dreams, and much, much more—some of which is invisible to us, unless we learn a new way to listen.
Listening for the DNA in the Voice of Others
At this point, it’s somewhat of a cliche to state that generative AI systems serve as a mirror, reflecting our collective cultural heritage. However, the notion that these systems reveal the ideas, patterns, and biases deeply ingrained in our culture is still apt, particularly if you shift your perspective from thinking of generative AI as a reflection of ourselves in a mirror to a different sensory perception—what we hear, which is most often the voice of others. The auditory experiences of echo, reflection, reverberation, and resonance can help us understand the essential question raised by the brief appearance of the persona Sydney in Microsoft’s early preview of Bing Chat: Who is speaking?
When we glance at ourselves in the mirror before leaving the house, we expect to see an accurate reflection of our appearance. We also commonly use mirrors to magnify our appearance so that we can more easily style our hair, apply make-up, shave, or confidently adjust our appearance in some other way. Mirrors help us see how we “really” look and allow us to see ourselves as others see us.
From a physics standpoint, when we see ourselves in a mirror, we are observing a reflection of the waves of light that bounce off our bodies. This is an almost instantaneous occurrence, which is why dancers can train with mirrors.
Sound is different. Sound waves travel through a medium (usually air) and move much more slowly than light. The four most common properties of sound are useful metaphors for analyzing the output of generative AI systems, such as ChatGPT:
- Echo: Echo is the repetition of a sound. Just like when you shout into a well, generative AI systems often repeat words used in the initial prompt. These echoes may indicate that the question you’re asking or language you’re using in your prompt includes very commonly used words or phrases. On one level, it’s a sign that you’re using “stock” language and may want to rephrase your question using less common words and phrases. However, echoes can also be affirmations. Since echoes aren’t usually complete repetitions, they also tell you which parts of your question have more “weight” in the system. A phrase that’s unexpectedly echoed may indicate that you’re onto something—an idea that warrants further exploration. 
- Reflection: Reflection refers to the bouncing or redirection of sound off of a surface. In generative AI, reflection occurs when the system produces an output that’s strongly related to the input but isn’t a direct repetition (an echo). Identifying which elements of the input are being reflected in the output can help you identify interesting connections. Unexpected reflections can help you identify potential blind spots, which often come from assumptions and biases that are inherent in your question and/or the data the system was trained on. 
- Reverberation: Reverberation refers to the persistence of sound in a space after the sound source has stopped. When sound waves continue to bounce off the walls of a room, the patterns of the sound waves become more complex and the sound itself is enhanced. In generative AI, reverberations can be detected in the way the system produces output that is not only based on the most recent input, but is also a continuation of earlier prompts. You may also experience reverberation when the output of the system continues to influence your thinking even after you’ve stopped interacting with it. By analyzing the ideas that persist, you can gain a deeper understanding of the impact that the system is having on your creative process. 
- Resonance: Resonance is the amplification of sound waves caused by matching the frequency of the sound source to the natural frequency of an object. For example, when you pluck the string of a guitar, the string vibrates at a specific frequency, and the body of the guitar resonates with the sound, creating a rich, satisfying sound. In generative AI, resonance can be seen in the way that the system amplifies certain patterns or ideas. By identifying these resonances, you can gain insight into the values and assumptions that underlie the system, and use that information to guide your creative decisions. 
 Resonance marks the beginning of creativity—it’s the moment when you first sense that there’s a connection between ideas. Ask yourself whether the output resonates with your creative vision, expresses your intentions, and reflects your the way you express yourself. Remember that creativity is a form of progressive ideation. Follow the ideas that resonate with you.
Using Generative AI to Find Your Voice
As I stated in a previously published article on voice, “many of the voices we know and love are constructs—interpretations of what we think a speaker sounds like as we read words on a page and animate them in our mind.” The illusion of voice is an amalgamation of conveyed experience, sensibility, perspective, personality, and skill. It’s not surprising generative AI systems can create the illusion of voice—the strands required to weave together the presence of a voice run throughout the training data the systems draw from. However, generative AI systems don’t have a consistent voice—every response is unique and multi-faceted.
When you use a generative AI system as a partner in your creative process, an essential part of a successful collaboration is teasing apart the threads that are woven into the voice you hear in your head as you read the system’s responses. Don’t accept a response without asking “Who is speaking?” Is the information in the response true? Are the perspectives, ideas, connections, and biases expressed in the response aligned with your beliefs and thoughts? Is the language being used consistent with the way you express yourself? Explore the various facets of the response before you weave those elements into the voice you will ultimately claim as your own.
The key to using generative AI to enhance your creative process is to approach it with intention and curiosity. It’s a tool to be used in the service of your creative vision, not a substitute for your intuition and creativity.
More to come…
Footnotes
* Microsoft has cleverly adopted the name “Copilot” to brand their implementation of augmented intelligence in their Microsoft 365 suite of productivity applications. The first of Copilot enabled apps are being tested by a small group of enterprise customers. Copilot will be rolled out to everyone else later this year.
OpenAI has released APIs to ChatGPT and Whisper, giving developers access to their AI-powered language, chat, and speech-to-text capabilities. The APIs make it much easier for developers to integrate these capabilities into their apps.
Amazon has announced a new suite of AWS AI services, including their own new Large Language Model and AWS-integrated access to third party generative AI systems, including those offered by Hugging Face.
These are just a few of the developments that are fueling the explosive growth of generative AI services, applications, extensions, and features.
** However, It would be foolish to state that a machine-based form of intelligence will never develop any aspect of sentience.
 
                        