September 27 – Data/bases

//September 27 – Data/bases
September 27 – Data/bases 2018-01-07T15:25:11-04:00

Guest Lecture

Prof. Deena Engel
Clinical Professor
Associate Director of Undergraduate Studies for the Computer Science Minors programs
Department of Computer Science


Trevor Owens. “Defining Data for Humanists: Text, Artifact, Information or Evidence?Journal of Digital Humanities, March 16, 2012.

Christof Schöch. “Big? Smart? Clean? Messy? Data in the Humanities.” Journal of Digital Humanities, November 22, 2013.

Lev Manovich. “Databases.” In The Language of New Media. Cambridge, Mass: MIT Press, 2002. 190-212.

Stephen Ramsay. “Databases.” In Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth. Oxford: Blackwell Publishing Professional, 2004.

C.M. Sperberg-McQueen. “Classification and Its Structures.” In Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth. Oxford: Blackwell Publishing Professional, 2004.

Additional Materials

To access the files used by Prof. Engel in her talk click here.



  1. Isabelle September 24, 2016 at 8:12 pm

    Schöch spends a fair amount of time creating a distinction between “big data” (a term, I think I can safely assume, we’ve all heard of) and “small data” (a term, he admits, is lesser known and not often used). He ensures us that there is, in fact, a strong difference between the two, despite the appearance of simply being on differing ends of a spectrum. Throughout the reading, I was questioning the necessity of this distinction (to the point of concocting the question to put here). But when I arrived at his conclusion, he declared that we need to transgress the opposition between these two terms, which sort of begs the question as to why he forced the issue in the first place. Are these two terms mutually exclusive, as he states throughout much of his piece? Or is it simply a question of relativity?

    The reading on Classifications (Sperberg-McQueen), while thorough and informative, left me feeling slightly unnerved. We’ve discussed previously the impossibility of creating certainty in the very uncertain realm of the humanities. And in a world that is constantly breaking down classifications, the uncertainty becomes greater. How can we marry the rigidity of classifications with the fluidity of the humanities?

    Manovich mentions that a database is essentially information stripped of narrative, in comparison to literature and film. Literature has undergone quite the transformation since its inception, and many post-modern works find themselves with hardly a narrative at all. What if databases are, in some way, just the next step for literary movements? What if this sort of information-only presentation is where literature was headed after diving (perhaps too deeply) into the human mind?

  2. Leslie September 26, 2016 at 5:54 pm

    In “Defining Data for Humanists: Text, Artifact, Information, or Evidence?,” Owens writes, “Importantly, the results of processed information are not necessarily declarative answers for humanists. If we take seriously Stephen Ramsay’s suggestions for algorithmic criticism, then data offers humanists the opportunity to manipulate or algorithmically derive or generate new artifacts, objects, and texts that we also can read and explore. For humanists, the results of information processing are open to the same kinds of hermeneutic exploration and interpretation as the original data.” This feels so circular to me. I’m confused. Isn’t the point of information to be, in fact, declarative? How can we process data into information that isn’t definite (aside for human construction, etc.)? I don’t know why I can’t quite grasp it.

    I understand the difference between big data and smart data in my head. However, like when we were trying to define digital humanities, I’m kind of at a loss for words. In “Big? Smart? Clean? Messy? Data in the Humanities,” Schöch even says that there aren’t formal definitions, just a sort of understanding of the concepts. Since this piece was written in 2013, I was wondering if there’s an updated version. Has someone come up with concrete definitions for these two datatypes? Or are we doomed in DH to never be able to fully define anything? (Sorry, I’m having a hard time with these readings this week.)

    While “Databases” by Ramsay, I became interested in the idea of redundancies. One of the specific examples was the fact that all the publishing houses in the data set are in New York. They would all be denoted with the same integer, creating redundancies. Why is this so bad? Is it simply because if you search for “New York” in the database, every entry would appear in the search results? I think it is important to include the place of publication, especially now since publishing houses exist in cities other than New York. Is there a way to include that information without making it searchable?

  3. Cristina September 26, 2016 at 10:34 pm

    In “Big? Smart? Clean? Messy? Data in the Humanities” Christof Schöch cites Doug Laney’s key qualities of big data: volume, velocity and variety. Schöch breezes by the velocity of big data as “a constant influx of new data” that “is being analyzed in real time and has to be very quick and responsive.” This characteristic of big data, he claims is “probably the least relevant to data in the humanities, at least today.” What conditions of practice and scope in the humanities can you see affecting the prevalence of velocity in DH? Think of a few examples that would bring velocity to the fore. Can these speculative changes be broken down to correlations between the “first phase” (textual studies) and “second phase” (conceptual modeling) of DH most explicitly outlined in the chapters by Johanna Drucker and N. Katherine Hayles that we read or September 13th?

    Of the “examples of projects investigating database politics and possible aesthetics” from the mid-to-late 1990s given on page 196 of Lev Manovich’s chapter on databases in The Language of New Media, none are functionally interactive, if they are accessible at all. For example, George Legrady’s website presents a visual archive of works from that time in the form of static pages that summarize the previous interactive installations. While these are artworks and therefore not beholden to the access domain of the humanities broadly, each involves the interplay between personal history, memory and larger Historical narrative. Is it important that these works be made accessible? What does it mean for the advancement of a dialogue around digital arts and humanities that the recent history of the field is inaccessible?

  4. Zejun September 27, 2016 at 12:34 am

    Q1. In “The Language of New Media”, Manovich regards “an arbitrary sequence of database records is not a narrative” by quoting Mieke Bal’s definition of narrative (P201). How true is this statement in the context of new media (e.g. sandbox games)?

    Q2. In “Big? Smart? Clean? Messy? Data in the Humanities”, Schöch mentions the 3Vs of big data (volume, velocity and variety) and claims that “This aspect (velocity) of big data is probably the least relevant to data in the humanities, at least today.” Is there any hidden assumption that humanities study is always about the past? Will DH changes this assumption?

    Q3. In “Defining Data for Humanists: Text, Artifact, Information or Evidence?”, Owens claims the value of data by saying “scholars can uncover information, facts, figures, perspectives, meanings, and traces of thoughts and ideas through the analysis, interpretation, exploration, and engagement with data”. Similarly, Manovich also addresses the importance of data by saying the main goal of all new media design is to “provide an interface to data” (192). However, can we argue what really matters to us is not data itself but its various expressions and interpretations?

  5. Lauren September 27, 2016 at 12:41 am

    Most Database constructions are made for the natural sciences as noted in “Big? Smart? Clean? Messy? Data in the Humanities” with the ideas of big data being the hot topic of the last two years, as the conversation becomes heavily focused on data how much will the humanities community be able to influence the design of the features within the work over their work being confined and forced into the way of thinking of only the natural sciences. “Variety of formats, complexity or lack of structure does come into play, however. In fact, the distinctive mark of big data in the humanities seems to be a methodological shift rather than a primarily technological one. And it is a huge methodological shift. ” is this methodological shift impactful to the way the work is done? I found the idea of a NoSQL DB to be fascinating for humanities work, there becomes way more flexibility in the way it is stored and accessed but likely to lose most of the ability to be able to actually process and use the work. would these kinds of nonstrucutre tools open up a way of thinking that was not processed before?

    The structure of databases forces a representation and relationship between the rows and columns of the data. This forcing structure as notes has both the ablity to be able to take a seeming complex work and break it down into the representative pieces to be explored, yet the connections must be reduced down to either being a column base or a row base relationship, there is only two types of relationship for every pair, how is this limiting in the way of exploring this kind of work? In C. M. Sperberg-McQueen he talkes about the N -dimensional aspect but this misses the fact that the meta relationships can only have a limited amount of meaning in most conventional stores.l

  6. Ariel September 27, 2016 at 9:06 am

    In Trevor Owens’ article, he writes, “Therefore, data is an artifact or a text that can hold the same potential evidentiary value as any other kind of artifact. That is, scholars can uncover information, facts, figures, perspectives, meanings, and traces of thoughts and ideas through the analysis, interpretation, exploration, and engagement with data, which in turn can be deployed as evidence to support all manner of claims and arguments.” Which makes me think, when stated this way, what is new about this idea?

    Going along the same idea in my previous questions, in Christof Schoch’s article, he writes, “But rarely would they consider their objects of study to be ‘data.’ However, in the humanities just as in other areas of research, we are increasingly dealing with ‘data.’” Why is it this way? In Owen’s article, he is very convincing in how data is already a part of the humanities, but Schoch’s article seems to be making the distinction more apparent.

    In Manovich’s article, he says, “The open nature of the Web as medium (Web pages are computer files which can always be edited) means that the Web sites never have to be complete; and they rarely are” (196). This made me reflect on the Web as a whole. So what does that say about our current state? Considering that the Web’s presence is undeniable in our society, how does the fact that web pages are rarely ever complete play into our lives?

  7. kcauley September 27, 2016 at 2:34 pm

    Can the definition of DH be simplified to the practice of curating cultural data into a purposeful narrative?

    Manovich questions the ability of a website to complete a purpose beyond storing data. He claims that because webpages are constantly transforming, they have ‘anti-narrative logic.’ Was this a valid argument when written in 2002, and furthermore does it hold sway today? Part of his argument relies on the distinction between CD-ROMs and webpages. How has the evolution of these mediums altered or sustained this argument?

    How much control do humanities scholars have over the application of big or smart data? Are humanist mostly curating their own databases, or using those already available to pose research inquiries? Does the modern research librarian play a valuable role in this scholarship process? Or will collection be left up to tech companies?

  8. Shoshanah September 27, 2016 at 3:51 pm

    You say data, I say capta, let’s call the whole thing off!

    An analogy based on the reading for this week:

    There are 26 letters in the English alphabet. In and of themselves, these letters have no intrinsic meaning. These letters are not evidence, they are not “fact,” they tell no narrative story. The alphabet is, like data, “a multifaced object which can be mobilized as evidence in support of an argument” (Owens). Words are a human construction of letters. Stop too, is evidence manufactured from data by humans.

    It is important to keep in mind that, just like a person chooses the words they use to tell a story, a person chooses the data that supports their narrative and discards the data which does not. As students, we all participate in this process of curating. When writing critical research papers we cite data which supports our theses and leave out the data which does not. It is easy to moralize this process as selectively inclusive. For this reason I like Drucker’s proposal of the term capta rather than data; we are building support for our narratives from information we’ve chosen to capture. “In other words, capturing data is not passively accepting what is given, but actively constructing what one is interested in” (Schoch). Is this definition of data/capta only applicable to soft sciences like Digital Humanities? Do medical researchers also curate the data they use to support their claims, discarding that which does not apply? Is there a moral responsibility to acknowledge facts which do not support a theory or hypothesis?

  9. Whitney September 27, 2016 at 4:30 pm

    After reading Owens’ article, I felt uneasy about the idea of data as an artifact. So much of current culture is stored and processed digitally, but I’m still completely unlearned when it comes to knowledge about securing and backing up databases. Are there others who are likewise insecure about the fragility of non-tangible cultural artifacts – or is this completely unfounded? If Google’s main databases suddenly implode, what would the outcome be?

    Manovich’s “Databases” touched on the need for the creation of “info-aesthetics,” essentially the ability to theoretically analyze the aesthetics of information and information processing. Is this not exactly what is considered the “cult of design” with companies like Apple? Or is Manovich reaching for info-aesthetics to be integrated at the deeper level of coding? After last weeks’ readings, it seems unlikely to me that coders do not already practice this as well.

    What was most interesting to me from the readings this week came from Sperberg-McQueen’s article “Classification and Its Structures.” In this article, it is noted that classification of data can be considered very theoretical because perfect classification would require perfect knowledge of the object being classified. Have there been any instances of perfect classification?

  10. Anna September 27, 2016 at 8:31 pm

    1. In Defining Data for Humanists: Text, Artifact, Information or Evidence? When the author tries to explain what is data for humanist, he includes « data as text, artifact, and processable information” but is data what is going to be digitalized or what has already been digitalized, in which case wouldn’t it be just layers and layers of code?

    2. In C.M. Sperberg-McQueen. “Classification and Its Structures.” Given that every culture value different properties of an object over others and have specific ways of classifications that differs from other cultures. Even if twe have perfect knowledge of the object, how can we create a ‘perfect classification’? How do we know what kind of classification fits better the person who uses it?

    3. In Christof Schöch. “Big? Smart? Clean? Messy? Data in the Humanities.” The subtitles always mentions in the humanities within parentheses so does it Implies that data, smart data and big data are seen differently or include other significations in other fields? And if yes, which ones? And in which other fields?

  11. lbowen February 28, 2017 at 10:56 am

    1. In “Big? Smart? Clean? Messy? Data in the Humanities” Schöch references Christine Borgman’s work Scholarship in the Digital Age, which illustrates that digitized humanistic “data” is more than a technical issue, it “is as much a theoretical, methodological and social issue as it is a technical issue” (2). This claim made me think me wonder which Digital Humanities scholars are seriously taking up issues around ethics? The article does not go into this realm at all. Generally speaking, digitization, in my experience, is viewed as a positive development that will provide greater access which is assumed to be a positive development. But what happens when certain groups desire to purposefully restrict access to their print or material culture?
    2. In “Databases,” Manovich compares and contrasts the uses and functionality of computer databases to traditional means of organizing and collecting documents and other materials. Manovich seems to suggest that the computer database is capable of replacing traditional brick and mortar institutions, “A library, a museum, in fact, any large collection of cultural data are being substituted by a computer database” (191). Further to this point, he argues that the database as virtual, interactive 3D space is also capable of providing experiences that we previously relied upon our imagination for, “[3D space] accomplishes the same effects which before were created by literary and cinematic narrative.” Despite initial fears within the museum world that mounting collections online would lead to a reduction in museum attendance, we’ve seen that such fears were unfounded as there is something special and unique about visiting a space. Today, such “traditional” institutions are increasingly adopting Virtual Reality. I couldn’t help but wonder what Manovich would theorize about this development in which institutions such as the library and museum are adopting both forms of database as central elements in their design of exhibitions?
    3. Trevor Owens argues that humanists can interrogate date for the same kinds of questions they address through analysis of texts and artifacts. In addressing data with similar methodologies used to interpret text and other sources for humanistic theory, I kept wondering if the similar pitfalls occur? Having read about “queering the archive” last semester, I couldn’t help but wonder if there are similar ways to queer big data and its analysis?

Leave A Comment