February 7 – Algorithmic Methods of Humanistic Work

///February 7 – Algorithmic Methods of Humanistic Work
February 7 – Algorithmic Methods of Humanistic Work 2017-02-03T14:31:52-05:00
  • Introduction of History Moves textual data
    • 8pm Skype call with Jennie Brier and Matt Wizinsky


Graham, Shawn, Ian Milligan, and Scott Weingart. Exploring Big Historical Data: The Historian’s Macroscope. Hackensack, NJ: Imperial College Press, 2015. (selections)(CommentPress site)

  • Chapter One:
    • The Joys of Big Data for Historians
    • Big Data
    • Putting Big Data to Good Use: Historical Case Studies
    • The Limits of Big Data, or Big Data and the Practice of the History
  • Chapter Two:
    • Building The Historian’s Toolkit
    • Automatic Retrieval of Data
    • How To Become A Programming Historian, a Gentle Introduction
    • Basic Scraping: Getting Your Data
    • Normalizing and Tokenizing Your Data
    • Chapter Two Conclusion: Bringing It All Together: What’s Ahead in the Great Unread

Hoover, David L. “Quantitative Analysis and Literary Studies.” In Companion to Digital Literary Studies (Blackwell Companions to Literature and Culture), edited by Susan Schreibman and Ray Siemens. Oxford: Blackwell Publishing Professional, 2008.

Jockers, Matthew Lee. Macroanalysis: Digital Methods and Literary History. Topics in the Digital Humanities. Urbana: University of Illinois Press, 2013.

  • Part I. Foundation
    • Revolution
    • Evidence
    • Tradition
    • Macroanalysis

Ramsay, Stephen. Reading Machines: Toward an Algorithmic Criticism. Urbana: University of Illinois Press, 2011.

Additional Food for Thought

Arnold Weinstein, “Don’t Turn Away From the Art of Life,” The New York Times, February 23, 2016.


  1. Alfo February 6, 2017 at 11:10 pm

    1. In “Macroanalysis” (Macroanalysis: Digital Methods and Literary History), Mathew Lee stresses the idea of using big data in literary studies to, broadly speaking, “dissolve authorship”. How common is this ‘distant’ approach to literature among digital humanists?

    2. A passage in Stephen Ramsay’s Reading Machines caught my attention: “It is not that such matters as redemptive worldviews and Marxist readings of texts can be arrived at algorithmically, but simply that algorithmic transformation can provide the alternative visions that give rise to such readings. The computer does this in a particularly useful way by carrying out transformations in a rigidly holistic manner.” What does he mean by relating algorithmic text analysis to some sort of rigid holism?

    3. Last week I read a quote by Émile Durkheim that kept coming to my mind while reading these articles: “We buoy ourselves up with a vain hope if we believe that the best means of preparing for the coming of a new science is first patiently to accumulate all the data it will use. For we
    cannot know what it will require unless we have already formed some conception of it.” I thought it’s a fun quote to bring up to this week’s discussion.

  2. Hannah February 7, 2017 at 1:13 pm

    Jockers says that literary studies should strive for a similar goal as science, even if it’s a matter of opinion – but I’m not sure how to understand this. It seems to be contradictory to the humanities – the idea of fact versus opinion doesn’t seem to be taken into account here.

    What is the value of quantifying data to bolster opinions? Does it really make opinions more valid especially if it cannot be analyzed in the same way?

    Ramsay says, “Text analysis arises to assist the critic, but only if the critic agrees to operate within the regime of scientific methodology with its “refutations” of hypotheses.” This is sort of the problem I’m curious about, how can you be a humanist and adhere to the scientific method in this way, when in general that challenges the traditional methodology of the humanities?

  3. Shoshanah February 7, 2017 at 2:27 pm

    In ‘The Joys of Big Data for Historians’…
    I’m confused about the distinction being made between micro/macro scope and micro/macro history. I found the example given—tracking of word usage in tweets during a presidential debate—even more confusing. What do they mean “A macroscope…could fit into microhistory”?

    I’m interested in the example given in ‘Historical Case Studies’ regarding Ben Zimmer charting the rhetorical shift from “The United States are” to “The United States is.” It seems to me, in this situation, quantitative computational analysis is helping to demonstrate the qualitative and affective change in national identity through a measure of linguistics. This example actually disproves the “valid hesitancy around the use of the term ‘data’ itself as it has a faint whiff of quantifying and reducing the meaningful life experiences of the past to numbers” (mentioned in the previous section on ‘Big Data.’ In the Zimmer example, the act of quantifying, itself, qualifies. Doesn’t this mean—irregardless of the smell—that ‘data’ has both quantitative and qualitative implications and uses?

    I have a problem with the section ‘The Limits of Big Data.’ The suggestion is that research follows one of two contradictory trajectories: 1) here is an interesting way of thinking about this, or 2) the evidence supports this claim. I do not believe it has to be one or the other. In ‘Collections and Connections’ last semester I began both digital projects with an idea of an interesting way of thinking about something. That lead me to the creation of a digital project, and to a subsequent self-reflection on whether or not the evidence supported my claim. Isn’t that the natural progression of research? Why can’t an interesting approach be supported by evidence? Why can’t evidence lead to an interesting way of thinking?

    Additionally, it seems unfair to criticize digital historians for their inability to time travel. “Yet even with terabytes upon terabytes of archival documents, we are still only seeing traces of the past today…. More traces, yes, but still traces: brief shadows of things that were.” DUH!!! By very definition, things that WERE no longer ARE. To expect anything more than traces, shadows, artifacts is completely illogical.

  4. Jane Excell February 7, 2017 at 4:38 pm

    The combination of quantitative analysis and Literary Studies is really exciting to me, but I still have trouble applying some of the numerical results of these analyses to concepts of literary criticism. For example, Hoover states, “[a]nother simple operation is dividing the frequency of an item in one text by its frequency in another, yielding the distinctiveness ratio (DR), a measure of the difference between the texts. Ratios below 0.67 or above 1.5 are normally considered worth investigating.” I know that these baselines must be established through some comparative process, but it still boggles my mind a little. How can generalized “norms” relating numerical data to critical insight be established across works and between authors?

    In Reading Machines… I felt that Ramsey contradicted himself in ways that confused me. One example was his statement that, “[l]iterary criticism operates within a hermeneutical framework in which the specifically scientific meaning of fact, metric, verification, and evidence simply do not apply” (7). I found myself doubting this as I read it (or very possibly misunderstanding it); even with my limited knowledge of DH, I feel that Moretti’s Big Data work on detective stories and Hoover’s explanation of critical work done using Ngrams are both examples of literary criticism based on quantifiable facts. I was further confused by Ramsey’s own analysis of the frequency of certain words spoken by different characters in a Woolf novel. Doesn’t this example contradict his idea that in literary criticism, scientific facts (in which I would include frequency) do not apply?

    Hoover and Ramsey both use texts by Virginia Woolf as examples of the potential insights afforded by DH. Jockers also mentions her, albeit briefly. Is there something particular about Woolf’s work that seems to lend itself to digital analysis? What does this say about Woolf’s writing? What does it say about the ways in which digital tools are most commonly being applied to literary criticism today?

  5. Whitney February 7, 2017 at 4:53 pm

    1.) Is there a unified theory or means for selectively reducing content via the macroscope for Big Data? How do researchers decide what to scan Big Data for?

    2.) Does anyone else see an issue with the idea of likening Big Data to Modernism and “the excitement around ‘scientific’ history” as Graham, Milligan, and Weingart do in their article “Exploring Big Historical Data: The Historian’s Microscope”?

    3.) If, as David Hoover suggests, “quantitative approaches are more naturally associated with questions of authorship and style, but they can also be used to investigate larger interpretive issues like plot, theme, genre, period, tone, and modality,” what are some alternative ways that you would integrate quantitative approaches?

  6. phyllis plitch February 7, 2017 at 5:08 pm

    1. The word “macro” carries a lot of weight in both Matthew Jockers’s Macroanalysis (concerning literature) and in Historian’s Microscope: Big Digital History by S. Graham, I. Milligan, & S. Weingart (concerning history). Does the concept mean the same thing in both articles and thus the only difference is the body of work being studied? If not, how does the concept of macro differ within the context of each category?

    2. What are the key ways that DH differs in literature vs. history?

    3. Do one or the other (macro analysis of literature vs macro analysis of history) fit more squarely into the concept of Digital Humanities?

  7. Cat February 7, 2017 at 5:40 pm

    1. In Exploring Big Historical Data: The History, the authors note, “For all the excitement around the potential offered by digital methods, it is important to keep in mind that it does not herald a transformation in the epistemological foundation of history. We are still working with traces of the past.” In addition possibly feeling inadequate or not capable of new technology practices, is it possible that some traditional historians avoid using new data methods for fear of the colleagues who are forgetting this?

    2. The authors of Exploring Big Historical Data: The History note that “there is the issue that programming can seem antithetical to the humanistic tradition.” How did programming acquire this reputation when, as they explain, it is a creative practice?

    3. Even with impressive emerging technologies, only a sliver of the past is available to us today. Is there a fear among digital historians/humanists that in the future anything not digitized/easily available will be forgotten and ignored?

  8. Lauren February 7, 2017 at 6:13 pm

    The size of “big data” that was referenced is in fact quite small when compared to the computer science buzz word of big data. Where do the actual big data sets lie within humanities fields?

    Through the example of using markup to find interesting trends within the trading routes, there is a notion of knowing which patterns you are looking for to inform how you should “mark up” the document. This process seems to imply a need to have known questions in mind compared to the work on the Baily papers which was able to do the “search for the question” – find the patterns you weren’t looking for. Is this the right categorization? which techniques should be used with which fields/projects? what are the tools that enable it? Most of the ngram work seems to allow the ability to look for interesting patterns but the API is pretty limiting in the ablitiy to construct the questions. How does context work? I found most of the method lacked the complexities and meta data?

  9. Sarah February 7, 2017 at 8:14 pm

    Inductive v. Deductive Argumentation

    Inductive reasoning is probable, based upon the evidence given. (bottom-up logic). Expanding on the logic and evidence given.

    Deductive reasoning links premises with conclusions. (top-down logic) A conclusion is reached reductively by applying general rules that hold over the entirety of a closed domain of discourse, narrowing the range under consideration until only the conclusion(s) is left.

  10. jpetinos February 14, 2017 at 5:13 pm

    E.L. Doctorow in his essay “False Documents” refers to the idea of a fact as malleable, and the result of all sorts of forces and power structures. Historic texts, as we discussed in class, are influenced by the winners of history, by ‘winning medium’ (durable materials versus documents that were destroyed, did not withstand the test of time,) by biases of a historian or social scientist or even biases built into a medium. Doctorow uses this idea to describe fiction, and art more generally, as a form that shows what is possible, as conveying what could very well be, given its slippery definition, fact. Our class discussions centered on the flaws and potential for error within the Digital Humanities, as defined by applying social science methods to the humanities, which traditional exist in the gray areas. For this reason, I think emphasizing the “Humanities” aspects of “Digital Humanities,” allows us to view the field more as an art than a science, allows us to see ideas, numbers, documents, etc. in a different light or see new possibilities, that while not necessarily ‘true’ or ‘rigid’ might still offer something valuable, or even beautiful, regardless.

  11. Sarah February 14, 2017 at 8:04 pm

    Inductive reasoning is probable, based upon the evidence given. (bottom-up logic). Expanding on the logic and evidence given.

    Deductive reasoning links premises with conclusions. (top-down logic) A conclusion is reached reductively by applying general rules that hold over the entirety of a closed domain of discourse, narrowing the range under consideration until only the conclusion(s) is left.

  12. jpetinos May 17, 2017 at 4:00 pm

    “Hoover writes: techniques related to artificial intelligence are also increasingly being applied, including neural networks, machine learning, and data mining (see, for example, Waugh, Adams, and Tweedie 2000). These methods, which require more computing power and expertise than many other methods, are sometimes used in authorship attribution, but more often in forensic than in literary contexts.” How common is it for someone to exist who has the expertise and interest in both literature/deep reading and also machine learning and data mining?

Leave A Comment