David Foster Wallace: The Limits of Text Analysis and Iterative Design

/, Project Proposal/David Foster Wallace: The Limits of Text Analysis and Iterative Design

David Foster Wallace: The Limits of Text Analysis and Iterative Design

David Foster Wallace was a prominent fiction writer in the United States in the late 1990s and early 2000s. He was immensely talented, widely recognized and appreciated for both his essays and works of nonfiction, and his six works of fiction. He was famous for his long, multi-clause, run-on sentences, his immense, unique vocabulary, footnotes embedded in his work. His unique voice is contagious. In one of the many obituaries written for Wallace,  A.O Scott wrote in the New York Times “Hyperarticulate, plaintive, self-mocking, diffident, overbearing, needy, ironical, almost pathologically self-aware (and nearly impossible to quote in increments smaller than a thousand words) — it was something you instantly recognized even hearing it for the first time. It was — is — the voice in your own head.”

Most notable to me, as a reader, is the side-by-side deployment of simple adverbs and prepositions alongside complicated vocabulary or complex ideas. Wallace has stated “Most of the modern writing I like the best is both sophisticated and colloquial—that is, high-level and complicated but at the same time intimate, sort of like a smart person is sitting right there talking to you—and I think I do little more than try to achieve this same high-low blend.”
His mission was to “author things that both restructure worlds and make living people feel stuff.” The result appears both self-conscious, and as a way to level with a reader, an attempt to reduce one’s intellectual magnitude so that the reader and writer are equals.
Wallace’s style is significant in the way it echoes his own internal struggles, of which there were many, and large ones at that. He was clinically depressed his entire life, spent a long stint in a halfway house to conquer an alcohol and drug addiction, and ultimately took his own life in 2008. Wallace was always preoccupied with how his intellect, and constant analysis, separated himself from those around him. His large body of fiction and nonfiction was thematically concerned with loneliness, the insufficiency of language to bridge the gaps between people, the emptiness that follows high achievement, among other topics that deal with daily life in the United States
In his last years of life, Wallace grew preoccupied with his writing style, feeling as if it was overdone and gimmicky. In a letter to his contemporary, Jonathan Franzen, Wallace wrote “I sit in the garage with the AC blasting and work very poorly and haltingly and with (some days) great reluctance and ambivalence and pain. I am tired of myself, it seems: tired of my thoughts, associations, syntax, various verbal habits that have gone from discovery to technique to tic.”
For my final project, I initially thought I would try to analyze how Wallace’s writing had changed over time, if, as he matured as a writer, he honed or refined his voice as he grew tired of his repeated gimmicks. I was interested in whether or not his frustrations with his writing were evident in his later works, and had a vague idea that I might connect his patterns of writing with larger themes involving his frustrations as a writer and eventual suicide.
I was intrigued by the text analysis tools we were shown in class, particularly “Voyant.” When David Hoover visited, I was captivated by his experience trying to determine if the character of Sherlock Holmes had a recognizable voice. I was also taken in by the way Professor Hoover methodically approached and compared texts with particular questions in mind, and how careful he was to avoid forcing the results of text analysis to match his hypothesis. Being able to talk more concretely about a writer’s style or voice, which is sometimes intangible and uncertain, is an exciting application of the Digital Humanities.
Voice is the aspect of fiction that has always drawn me in to a particular author, and Wallace’s unique voice in particular is what piqued my interest in literature; previously, I spent my leisure reading on mysteries and thrillers. It was a fateful day in my sophomore year undergraduate Journalism class when my professor displayed the following quotation during a lecture on creative nonfiction:

“The truth is you already know what it’s like. You already know the difference between the size and speed of everything that flashes through you and the tiny inadequate bit of it all you can ever let anyone know. As though inside you is this enormous room full of what seems like everything in the whole universe at one time or another and yet the only parts that get out have to somehow squeeze out through one of those tiny keyholes you see under the knob in older doors. As if we are all trying to see each other through these tiny keyholes.

I was also influenced by Jane Excel’s analysis of the Nancy Drew series, and was fascinated by the idea that you could observe changes in authorship or a book’s context by observing changes in language pattern over a fixed period of time.
During the visualization portion of our class, I discovered “The Largest Vocabulary in Hip Hop,” a visualization analyzing a rapper’s number of unique vocabulary words across a fixed number of lyrics, alongside a comparison to Shakespeare’s corpus of the same number of words.
I thought this was such an interesting and creative use of a digital humanities technique, and given that my favorite author has such an interesting, unique deployment of vocabulary and phrases, I thought I might apply the same kind of technique across the David Foster Wallace corpus to see how his vocabulary or preferred phrases might change over time.
Professor David Hoover encouraged me to begin the project in this fashion, without a thesis, to see if I found anything interesting on which I might comment. A thesis, he said, might arise naturally from a few discovered patterns.
I began converting as many David Foster Wallace texts I could find into plain text documents. Much of his work is copyright protected, and his entire body of work is massive. His Magnum Opus, Infinite Jest, is 1,079 pages. Right away, I knew that without including such large, essential portions of his work any conclusions I drew about Wallace’s writing would come with several large caveats.
I found around fifteen easily accessible works and organized them chronologically. I immediately encountered another difficulty: an author’s voice in fiction is usually very different from his or her voice or style in nonfiction. This was especially true for David Foster Wallace, who in his fiction conjured up various strange characters, corresponding voices, and varying styles from fiction work to fiction work. I decided I would focus exclusively on his nonfiction, narrowing the scope quite a bit, to ten works of varying lengths.

The second difficulty with this analysis was realizing that ten short works of nonfiction is not a very large set to be working with in the context of an author’s total output. No work was uniform in length. In addition, given that the author was active for a relatively short amount of time, I did not feel that I could make any large, sweeping conclusions about how Wallace’s voice or stylistic choices changed; the time period in question was narrow, and Wallace actively worked on several pieces of writing simultaneously, meaning that any found patterns would not necessarily be indicative of change over time. Lastly, Wallace’s work was often submitted to journals and periodicals, such as the New Yorker and The Atlantic, and the editor’s touch would affect pieces differently than works that had been written and edited by the author alone.
Even with these difficulties in mind, time was running out and I decided to proceed with my analysis of Wallace’s body of work. I uploaded the ten pieces of nonfiction into Voyant. I was interested in the prevalence of Ngrams, sequences of two, three, and four words. I assumed that Wallace’s verbal tics would form Ngrams, and that I could track these Ngrams over the course of his work. I sorted these Ngrams in order of the most frequently used, and found phrases such as the following: “and then,” “into the,” “a good,” “are not,” “most of the,” “or the,” “and its.” This told me nothing. Unfortunately, Voyant’s Ngram analysis revealed no discernable patterns at all.

I had inadvertently created a case study for everything we learned not to do in Digital Humanities: because my thesis was nonexistent, I ended up using a tool for its own sake instead of allowing the particular needs of my project to dictate which tool I used. I had also created the project with specific results in mind, convinced that I would see evidence of certain verbal patterns in Wallace’s work that just were not there.
Despite all this, the lack of “results” from this prototype left me with a few interesting questions and a new understanding of the Digital Humanities discipline. If there were no obvious or distinguishing Ngrams, what is it that makes me as a reader recognize a particular author’s work? Are features such as rhythm, clause length, certain combinations of types of words or parts of speech that a reader intuitively recognizes, able be captured by text analysis as simple as the one that I conducted?
Further: what makes a piece of art, particularly literature, resonate? Where does the ”magic” come from, and do we reach the limits of Digital Humanities’ jurisdiction as we try to answer this question? Perhaps, or, perhaps I would not be asking these questions without the initial analysis I conducted. In the end, I was most interested in why sentences such as the following were so interesting to me:

“The metaphysical explanation is that Roger Federer is one of those rare, preternatural athletes who appear to be exempt, at least in part, from certain physical laws. Good analogues here include Michael Jordan, who could not only jump inhumanly high but actually hang there a beat or two longer than gravity allows, and Muhammad Ali, who really could “float” across the canvas and land two or three jabs in the clock-time required for one. There are probably a half-dozen other examples since 1960. “

When I finished my presentation in class, Kimon suggested I go one step further with my analysis to see what kinds of interesting questions or conclusions I could discover. I decided to take the Ngrams that I did discover and extract the sentences in the text from which they came.
Here is where things started to get interesting. I downloaded the list of Ngrams from Voyant, and filtered for all Ngrams that were used more than two times. This time, I searched for Ngrams with three or more words in sequence, and sorted from most frequently used to least frequently used. The most commonly used, three word phrases were: “a lot of the,” “all the way,” “a kind of,” “way of,” “you could,” “it turns out that,” “on the other hand,” “and so on,” “there are,” and “a lot of.”

I began looking for these commonly used phrases in the context of each of the original texts. Some interesting features emerged. The first is another limitation to this type of analysis: note that “a lot of the” and “a lot of” both appear in the top ten most frequent Ngrams. Searching through the entire list of common phrases revealed that this was also true for phrases such as “kind of” (“any kind of,” “the kind of” “in a kind of,” “sort of,” “any sort of,” also appeared in the list) This pointed to certain phrase groupings and variations of words within these groupings that truly might be considered tics or stylistic patterns of the author. If the list of chosen texts in the corpus was expanded and standardized by length of work, these patterns would likely be even more pronounced.
By searching for these phrases in the original texts, extracting the sentences from which they came, and comparing each phrase to one another, I was able to look more closely at what makes a David Foster Wallace sentence unique, or at least, what allows me to recognize his style.

For example, as I extracted the original sentences, I noticed a sequence of the phrase “sort of” or “kind of” followed by an adverb, followed by an adjective, as in “a sort of sloppily pretty tech-savvy young woman” or “kind of blandly good-looking.”  I also noticed the way series of nouns and verbs were often grouped together using a series of “and”s instead of commas. For example, “: Grips tend to be large, beefy blue-collar guys with walrus mustaches and baseball caps and big wrists and beer guts..” and “A lot of the camera and sound and makeup crew are female..” and “wearing faded jeans and old running shoes and black T-shirts..” and “that strands tend to escape and trail and have to be chuffed out of the eyes periodically..”  I feel certain that these tendencies would be magnified when including additional texts in the corpus.
I was surprised to see what extracting the original sentences from the common phrases found in Voyant revealed. In class, we discussed the limitations of this type of analysis being conducted by someone from a pure technology background, or someone who is not versed in the texts. This prototype starkly reveals these limitations; my initial examination of the Voyant results revealed absolutely nothing important about Wallace’s work. It was only by going back and forth between the Voyant analysis and closer readings of the original texts that I was able to start to piece together a meaningful project.
A future iteration of this project would involve a full blown, in depth analysis of as many identifiable language patterns in Wallace’s work. I began this work by analyzing the most common Ngrams found in the corpus using Voyant, but by examining those Ngrams in context, and comparing the Ngrams to one another, larger patterns are revealed. Separating the sentences, reading the common phrases in context and comparing each sentence to each other would allow one to piece together the style of Wallace in rudimentary, part-of-speech terms.

Using Voyant allowed me to perform a close reading of a much small body of work by revealing what was hidden in the massive amount of text. I imagine I would get more interesting or obvious patterns if I tagged each word in the text by part of speech revealing the basic skeleton that composes sentences (noun, the word ‘and,’ noun, the word ‘and,’ a final noun.)

We read papers written by Matthew Wilkens and Franco Moretti arguing for the necessity of distant reading to fix issues in the humanities such as reliance on a canon or a lack of reading time in conjunction with an overabundance of written work. My project has revealed the limitations of relying exclusively on distant reading, and of the necessity to combine distant and close readings to reach any meaningful conclusion about a text.





By | 2018-01-04T16:07:55-05:00 May 15th, 2017|Categories: John, Project Proposal|0 Comments

About the Author:

Leave A Comment