SGA Encoding: Interpretation and Collaboration

The SGA encoding venture was valuable in that it introduced me to the scope and possibilities of interpretation and collaboration within the digital humanities arena, much of which we also touched upon during class. Going into this semester, I had almost no experience with either the textual criticism or computer markup language that the project necessitated, and was honestly quite wary of both at first. But like most encounters I have had with foreign languages, it is surprisingly easy to learn the XML alphabet, pick up a couple of key phrases, and use them to communicate basic ideas to others. Constructing comprehensive sentences to convey abstract thoughts, on the other hand, often requires a whole different force of knowledge and imagination.

During our XML coding sessions here at U.Va., we mostly worked within a fairly simple markup framework, that was both representational and descriptive; it attempted to show the editorial changes that Percy and Mary made to the manuscript. I encoded pages 74 (http://sga.mith.org/eng738t/ox-ms_abinger_c57-0074.jpg) and 75 (http://sga.mith.org/eng738t/ox-ms_abinger_c57-0075.jpg) from Chapter 10. In general, what I did was pretty straightforward denotation of <line>, <del rend=”strikethrough”>, <add place=”supralinear”> to respectively signify line breaks, deletions, and additions. If we were certain it was Percy’s modification (the Word document transcription would have to confirm), then we could mark it as <mod resp=”#pbs”>. Subjectively speaking, I found the work was strangely soothing, in a monotonous, “Modern Times” sort of way. Every once in awhile I would come across a hitch that I (with the aid of excellent group leader, Eliza) would have to overcome as best possible.

For example, if one looks at the first third of page 75, on the right hand side (5th line), there are a couple of strange insertions that Percy made to the writing: something like X and i. I ended up just signifying the X and i generally as <metamark>:

<line>I <del rend="strikethrough">could not</del> <add place="superlinear"><mod resp="#pbs">I was unable to</mod></add> overcome my repugnance <add place="superlinear"><mod resp="#pbs">to the task</mod></add> <metamark>‸</metamark><add place="superlinear"><metamark>X</metamark></add><metamark>I</metamark></line>

So I marked them up in a very simple way, giving the same code to two different symbols. As a novice encoder, I suppose that as the document that I committed to Github would move up the ladder to be verified by more experienced encoders, such <metamark>s might acquire more specificity. Why is there is X, and is it the job of the encoder to figure that out, or the consumer/researcher using the interface? My guess is that the “I” just meant Mary or Percy’s pen was running out of ink. Maybe someone more familiar with the other pages might be able to spot a pattern and signify them accordingly. I could be looking at it too closely, and it probably belongs in the “does not matter” category, but it was something that that I wonder about in the grander scheme of things.

Another hitch would be the strange lines that presumably Mary drew in the middle of page 75, with which I just marked <mod rend=”bordered”> all the words that had lines wrapped around them:

<line>perfect solitude: <del rend="strikethrough">my deli</del> <del rend="strikethrough">I used to</del> <mod rend="bordered">Alone</mod></line>
<line>in a little boat <mod rend="bordered">I passed whole</mod> days on</line>
<line><mod rend="bordered">the lake watching</mod>the clouds & <metamark>‸</metamark> <add place="superlinear"><mod resp="#pbs">listening to</mod></add> the ripp<add place="intralinear"></add>ling</line>

The lines are rather intricate, and I marked them up rather generally before sending them back to Maryland. I suppose quandaries such as this made me aware of some of the difficulties of trying to translate visual images into XML, and whether or not I had any authority to make certain interpretations. The outlines also seem to signify that either Mary or Percy is marking off some section, but sometimes it is unclear what is being marked, or what is more important: “the lake,” or “watching”? I think that for me, it is certainly difficult working on the production side of things while being quite unfamiliar with the writer’s habits, how markup is conveyed to the interface, or the audience.

So, during this project, in addition to acclimating myself to XML, I faced the much greater challenge of transitioning from solitary, reflective study to collaboration and interdisciplinary work within the digital humanities. Even in the last place on earth that still houses shy people (the university), I tend to fall pretty far on the “introversion” end of the spectrum. Thus the constantly evolving, makeshift nature of digital humanities and those reverberations within literary studies was both exciting and, to be honest, quite daunting for me. I think anyone would ask herself what her importance or value is in a digital project—a part in the machine, or an interpreter bringing her own voice to a literary text? And I guess one can’t help asking, is there a “right” project to join, or a necessary circle of digital humanists to be part of? Should one be brushing up on her calculus, or continue unraveling this very long George Eilot sentence? These are questions that I thought about during this semester.

Lastly, the interesting and unexpected delight about this particular encoding venture was that, in a way, it actually brought me closer to a text than I had ever could have imagined. A recurring theme during our semester has been the bracing critical distance that digital tools can allow for: that perhaps one of their greatest strengths is that they would compensate where human subjectivity might falter from emotion or by the limitations of having merely one single collector of data. The flipside of this would be that computers would mercilessly extract the “human” element of a text. But sans the digital, I probably never would have been able to see Mary’s original manuscript and feel its “aura” (yes, it’s survived). And although I was skeptical of some of Percy’s revisions (what exactly is the difference between “I could not” and “I was unable to”?), I had to admit I was quite taken by the whole collaborative process of these soul mates, and couldn’t help being touched by some small changes (“day” turns into “sunshine”) that Percy made to the novel. In grad school, we’re not supposed to say things are fun, but this was a fun and enjoyable experience.

Advertisements

Deceived by Data or Captivated by Capta?

So I’ve been wracking my brain for the past five-odd months, trying to figure out the moment when the world became ‘modern’—partially because I took a class on secularization, mostly because I need something to fill up those empty hours where a vibrant social life should be. Then came the moment of revelation. I was watching that excellent documentary The Fog of War, and in discussing  the countercultural response to the Vietnam war Robert McNamara used the phrase, ‘an issue of population control.’ And the thought popped into my head, “that is modernity right there. ‘Population control.’ All the hopes, dreams, fears, lives, loves of a generation—reduced to a matter of sociological stability. Statistical reductionism. Damn.”

Well, the point here is that statistics are now and have always been a problem for those of us in the humanities—we don’t like data, it seems too self-assured, too cocky, we distrust the smug arrogance of ‘this is the FINAL ANSWER,’ as well we might. And for centuries, this has been a right and proper division: the sciences presume to know everything with their ‘hard data’ and dogmatic materialism, the Humanities chuckle snidely to themselves  at their colleagues’ dogmatic slumbers. And never the twain shall meet. 

I guess nothing lasts forever, because now we’ve got DH coming along and throwing a monkey-wrench into the whole program. DH sees itself as a peacemaker, bridging the gap between those techie folks who can’t see the forest from the trees, and us humanities people, who have our heads stuck in the clouds on a good day. (Where my head is stuck on a bad day is not suitable for disclosure in polite conversation.) Franko Moretti represents one direction—make the humanities more empirical, an attitude whose risks keep me up at night. Joanna Drucker ‘s “Humanities Approaches to Graphical Display” represents the other direction—make empirical data, or at least the visualizations thereof, more Humanities-friendly: make it self-critical, over-determined, conscious of its own fallibility. She prefers the term capta to data, the opposition being between something ‘taken’ and ‘given,’ the former implying mediation—and I think she is right to do so. Knowing how much our culture loves the visual over the textual, I think it best to critique some of her suggestions for DH data visualizations, and extrapolate a criticism of her argument from there.

 

Instead of a neat and tidy bar graph to illustrate the number of books published in a certain year, Drucker and her graphic designer present us with the following:

 Image

 

Two immediate reactions: 1)Wow! Quantifying trends in publication is more complicated than I thought! 2)Where’s the Motrin?

It’s like–ever have one of those moments where you see someone doing something dumb and think, “I know what you’re trying to do, and I agree in principle, but—no. Here, now, in this way: just—no.” This is one of those moments for me. I think Drucker’s preference for capta over data is admirable—my God, crack open any newspaper with the slightest political bent and you’ll find biased belief supported by biased data over and over again—data is always subjectivized, and this is way too frequently forgotten. That is what Drucker is illustrating here, and I agree, and I get it. But it’s like reading Deleuze and Guattari’s rhizoic writing style, or referring to the feminine gender as ‘womyn’—you make your point so hard you beat your audience over the head, at which point they decide, “#$%& this intellectual stuff, what’s on my Netflix queue?’ 

What I mean is, this graph is confusing, but that is not the issue: it comes across as confusing for the sake of being confusing. I don’t feel like the bizarre notations and annotations are there to increase my knowledge of the data—I feel like the author is compelled to make a point about over-determination because I’m too dense to get it otherwise. I worry the typical viewer is going to find this annoying at best and patently patronizing at worst. So what would I do differently? Well, I think the polarity of clear/simple versus obscure/complex can’t be resolved through pen and paper means—we have to go digital. We need an infovis that can dynamically update itself based on user-input, but in such a way that the user is forced to exclude certain variables in order to include others. Suppose you first saw a standard bar graph, but each bar was an animated .gif showing different colors, varying heights and positions, etc. Then you would be presented with a pop-up menu by clicking on the bar and choosing your own variable, or combination of variables. The graphic could be as simple or complex as you want it to be—yet at each stage, by having to make a decision, you would be reminded (in a non-patronizing way) that there is no data, only capta. My $0.02, anyway.

 Image

The intention here is to measure the affective mood on certain days, and the methodology is to have ‘day’ as an empty metrical placeholder defined by whatever it contains. This is a wonderful idea, and an innovative approach to metrical units—my only concern is that the initial intention, the measuring of affect, seeming to have been left behind at some point during the journey.

So here’s my suggestion: represent each mood as a different color. So on Sunday, let’s say eating and drinking are colored blue for relaxing, design is orange for intense work, and study is grey for boring work. An algorithm could then compute the dominant mood for the day, based on the number of times that mood appears, and how large the word in which does appear is sized in comparison with other words. Then you could have the whole day turn one color, or better yet, dilute or mix the color depending on the multiple moods represented. Maybe you could color the text, so the result might be something like Monday.

 Image

Drucker concludes her article with the above graphic, envisioned as a response to this famous chart

 Image

where individual persons are represented as mere dots. 

Again, love the intention, don’t like the implementation. Problem: I don’t see what this adds; I feel like the people look so fake that the effect is almost worse, because it tries to adds verisimilitude and fails—it reminds me of that moment in Fight Club where it is suggested that the illustrated persons in in-flight safety manuals must be high on oxygen in order to look as fake and vanilla as they do. So am I just going to complain? Well, here’s an idea, maybe its worse, idk. Keep the original format—have a dot representing an individual or individual family or whatever, but make the dot a link. Clicking the link opens up a pop up, where an algorithm generates statistical data for the ‘person’ at that location: physical description, median income, eye color, telephone number, preference for David Lee Roth or Sammy Hagar. Hopefully this will make those dots seem more personable—maybe not, for if the preoccupation with statistics is the problem, then defining a human life by artificially numbers might be the last thing we need. Then again, it might reinforce the point that personhood is increasingly dominated by demographics, marketing research, arithmetic calculations and the like in our day and age. 

ImageImage

There’s not much for me to say here except—they’re marvelous! The aim in the former is to read an event not as disparate occurrence, but rather as a kind of vacuole around which the surrounding context is curved or inflected, the latter to demonstrate how travel time is derivative of the mood of the traveler. The subjective, personalized and over-determined aspects of the data are emphasized—but in such a way as I feel the emphasis is productive, rather than merely illustrative of the capta caveat.

 

Okay, so I’ve tried to give my opinion on what works and what doesn’t. So what? What is the relevance for the Humanities, specifically English Literary Humanities? Well I think the confusion between capta and data is alive and well in our own back yard. There is a long running trend in the English literary discipline towards a kind of fetishicization of the text, an adulation of the craftsmanship of the well wrought urn, an unfailing belief in the sacridity of the script. On one level this is right and proper, and none of us would be here if we didn’t cherish and admirer the beauty of literature.

But

a text is more than merely words and sentences and images; it is a network ruled by a certain tacit frameworks that govern the relationality of words, or concepts, or images, or affective responses. The text isn’t one thing, self-contained and beautiful and seamless; it is a patch-work of varying resonances and cadences, all vying for supremacy.

It has been difficult to quantity these with traditional approaches, partially because a vocabulary to describe them had not come to the fore—one had a specialized toolkit with which to diagram sentences and classify rhythm, but the conceptual apparatus has remained either obscured or mired in theory-speak. I think the sort of infovis Drucker proposes offers a new avenue for textual analysis, analysis which is radically self-aware, non-linear, and interactive. The key is proper visualization: there are interactive, dynamic possibilities within the visual domain that are simply not possible within the analogue medium. This should and, I am confident, will be explored in the coming years, as DH passes from early childhood to adolescence it will be well to bear Drucker’s suggestions and methodologies towards data in mind.

The History of the Book Is Not For Wimps

When Terry Belanger taught his undergraduate course on the history of the book, he would often begin it by passing around a 19th century book: something totally common, in poor condition, but still something new to his students—a children’s novel in a series binding, for example. He would have all the students in the class look at it, hold it, try to notice unusual things about it. When everyone had examined it and learned a little bit about it, he would take it back and tear it up. Then, holding the fragments of the book up for his shocked class to see, he would say: “The history of the book is not for wimps.”

His aim was to prove that we have a far stronger sentimental attachment to books than we generally realize, and that we truly don’t want to see them destroyed. He wanted to inspire people to save something they didn’t know they cared about. (He also honestly wanted to say that the history of the book is not for wimps: books are being destroyed, deaccessioned, ignored, forgotten.)

Perhaps this sentimental attachment means that books are worth preserving; that’s certainly the argument some scholars and librarians will make. We care about these objects, even though we don’t always know it, so we should work to preserve them. But when a library is competing for funding with something immediately worthwhile—something like increasing the amount of financial aid available to students, or funding a science lab that’s working to cure cancer—is sentimentality going to win? Probably not, and maybe, in some cases it couldn’t.

There are a large number of other reasons to preserve these books, of course; a long, important list. These books preserve our cultural heritage, and we can’t know who we are without them—but that’s an awfully difficult argument to make, sometimes. It also leads to a particular trap: not every book in the Alderman stacks has something to say about the modern person, even if it has something very clear to say about Jane Slaughter. Even when people are won over by the stories of some of the marginalia Professor Stauffer has written about, they are still only enhancing their sentimental attachment to the books. It’s difficult to develop a real sense of the scholarly importance of the physical book, and the individual copy of the physical book, as we all know.

There are excellent reasons to move to digital books, too. Many people I know read more books now because they can read on their ipad or even on their phone; they don’t have to take the trouble of selecting a book and carrying it around with them all day. It’s much more important that people read and think than that they have books on their walls, being preserved.

I truly believe that rare books are important—but, at the same time, I know I’m subject to sentimentality, perhaps even more than anyone else. I honestly can’t tell why I want to save some books. Every time I see beautiful Victorian books rotting on the shelves of a library, it makes me sad—but is that because these books are great resources and we are throwing away our heritage, or because I just like old books? Sometimes I honestly can’t tell. Are we angry at the evolution of the academic publishing world because we’re going to lose important scholarship, or because we don’t want an established system to change?

I can’t tell the difference between what is truly important and what I care about. When I phrase it that way, of course, it seems like there is no difference. But what I care about isn’t necessarily what other people care about, and I want to work to save books partly because I want to help other people. So I end up reminding myself every day to be less sentimental. I debate with myself: why is this book important? Why does it deserve space on my bookshelf, or on Rare Book School’s shelf? I force myself to make a practical argument. If I have no argument better than “because it’s a book!” then I tell myself to let it go, and to save the shelf space for a book that truly speaks about the human race. Like Terry, I attack my sentimentality, because it means I focus on the arguments that aren’t based on emotions—the arguments that, perhaps, can be won. I force myself to be practical about the future of the book: it’s ok if people want to read digital books, and it’s ok if libraries need to shift some books off-site because they don’t have space. But at some point it is no longer ok—and I can’t tell where that point lies if I’m being sentimental. After all, if people develop an extremely emotional response to nineteenth-century books, they will never risk opening them.

In the end, I think the emotional argument may be the best one to convince the average person to care about the average Victorian book; but the sentimental viewpoint is the worst one for a librarian or a scholar to take.

On My Walk with the Woodchippers

            Recently I attended a colleague’s job-talk concerning the integration of DH technology with theoretical hermeneutics, and he made a statement with which I very much agreed—paraphrased, that raw data sans some theoretical or analytical methodology for interpreting the data is meaningless. My experience with the Woodchipper project, sponsored by the MYTH cohort at U. Maryland, provided me with an opportunity to test that belief. The Woodchipper tool is itself  a remarkably promising conception—as the following screenshot shows, the interface attempts to isolate and graph the occurrence of certain verbal chains throughout the selected text or texts.

 

Image 

 The initial guidelines for testing were somewhat vague—choose several texts of the same or different genres, plug them in and see what comes out. I tried this at first, but found myself confounded by the seeming randomness of the data.

There seems to be a stereotype as of late of over-zealous scholars who tear the text asunder, hacking away at the text’s organic foliage for the sake of isolating one small twig, itself mercilessly subjected to overreading and underreading and misreading. One imagines the analyst in the grab of a Spanish Inquisitor, the text tied to the wrack, mercilessly stretched beyond recognition for the sake or some theory or other.

 

I want to offer another viewpoint here. The literary text, as I see it, is a conglomeration of divergent elements—one doesn’t have to  endorse Bakhtin here, it is readily apparent even in the hierarchical structure of letter, word, phrase, sentence, paragraph, chapter, book. If they are all unified completely and exclusively, remove one and the text falls apart—but texts don’t emerge from a vacuum, or come fully formed from the author’s imagination. Texts emerge over time, as one concept is proffered, then interpolated by its neighbors; the text is a very effective network guided by the authorial intention (even if it manifests itself by its absence). So why shouldn’t we pluck off one competent, see how it works with the rest, and not so much prune as remove seed and replant?

 

            All this is by way of preface to the methodology I choose. Always fascinated by Eve Sedgwick’s Imagery of the Surface in the Gothic Novel, I re-read her essay, attempting to derive certain key concepts, from which I extracted certain word-patterns that one would expect to find in the text, should Sedgwick’s analysis be borne out. I then attempted to interpret the data along these lines and the results, included below, were far from conclusive. But I did see the possibility for theoretical interpretations of the data along the lines indicated by Sedgwick—it merely required, as with everything in the humanities, a little bit of faith and a large amount of interpretative work.

 

*

I decided to utilize Eve Sedgwick’s earth-shattering analysis of surfaces and veils in the Gothic (attached) to see if the output of woodchipper could provide a measure of verisimilitude for works of more abstract theory.

 

I. Context

 

Without going too much into detail, Sedgwick’s analysis yielded the following thematic points of emphasis:

 Opposition between surface and veils

 Relationship between writing/inscription

 Character or self as socially constructed or externally imposed rather than innate

 Emphasis on the visual proprieties, especially facial characteristics, when determining character

 Failure or doubt of the aforementioned identification

 Semantic ambiguity, especially as regards the criterion for visual identification of character

 Metonymic slippage of surfaces/the veil; that is, the veil seems to absorb some aspect of its wears’ personality, and transmits this to other characters

 ‘Two dimensional,’ stock or underdeveloped characters

 Repetition of motifs of landscape, color, music; these form a fixed, stable backdrop by dint of their repetition

 The word ‘candor’ and the pallor of white are identified as terms of tantamount importance

 Relationship between blood and flesh

 Betrayed desire

 The rational (conscious) versus irrational (libidinal)—Sedgwick identifies this as the dominant tenor of critical theory concerning the Gothic, which she immediately dismisses

 The erotic charge of veils and surfaces

 II. The Experiment

 

From the above, I was looking to see what verbal chains woodchipper identified within two of the three texts Sedgwick cited, viz. The Mystery of Udolpho and The Monk.

 

III. The Results

 

 

The Monk:

 Image

A1)      felt made conduct received heart

A2)      moment made escape found length

A3)      god heaven life death man

A4)      love heart loved world happy

A5)      hand eyes face looked hands

 

Udolpho:

Image 

B1)      felt made conduct received heart

B2)      trees woods mountains tree green

B2)      mind heart tears grief seemed

B4)      door room open opened light

B5)      dear see young good man

 

 

 

 

 

 

IV: (Very Subjective) Interpretation of Results:

 

A1/B1)           The line

            felt made conduct received heart

was both common to both texts and rather frequent, if I understand the Woodchipper interface correctly; at any rate, it’s important. At first reading I understood ‘conduct’ to mean conductivity in a sense of heat or electrical exchange, which would lend support to Sedgwick’s point about the metonymic character of the veil. Closer examination (and common sense!) revealed, however, that conducted referred here to one’s persona, one’s character. The word ‘conduct’ has a connotation of exteriority—it is how one comports oneself in public, rather than what one’s true self might be. This, in proximity with the words ‘made’ and ‘received,’ may be interpreted as supporting Sedgwick’s assertion that character is imposed from the outside rather than innate to the individual character.

 

 

A2)      The presence of ‘god heaven life death man’ in The Monk and not in Udolpho is unsurprising given the former’s supernatural elements in contrast to the latter’s ultimately natural explanation for events.

 

A3)      The chain ‘moment made escape found length’ is not surprising, given the association of the gothic with captivity and discovery. However, the prevalence of ‘moment’ and length’ do hint at a text which is remarkably preoccupied with temporality and its measurement. I would have to look at a ‘control group’ of texts, however, to see if discussion of time in this fashion is merely a necessary symptom of novelic discourse.

 

 

A4)      love heart loved world happy’

To be honest I don’t quite know what to make of this one. The association of variations on love, heart and happy is hardly surprising; the intrusion of the word ‘world,’ though, might be a fluke. One could proffer a connection between intimate interior space (happiness and love) with the external world, but this assumes a definition of ‘world’ with umwelt in the philosophical sense, and this might appear a bit of a stretch.

 

A5)      The line ‘hand eyes face looked hands’ most supports Sedgwick’s claims, illustrating an inordinate preoccupation with ocular descriptions of character; the association of these terms with ‘looked’ would seem to tie them to a discourse of characterizing and description.

 

 

B2)      ‘trees woods mountains tree green’ would seem to confirm Sedgwick’s claim (which is by no means unique) about the omnipresence of natural language in the gothic, and does support her arguments about the presence of colour.

 

B3)      mind heart tears grief seemed

            The close association of ‘mind’ with ‘heart’ lends itself to the notion of emotional depths, which is supported by the affective terms ‘tears’ and ‘grief;’ this would at first glance support the argument Sedgwick is writing against, viz. the existence of hidden emotional depths beneath a repressive ego/super-ego exterior. However the outcropping of the fifth term ‘seemed’ can be explained by Sedgwick’s claim that appearances are frequently misleading. If one can infer from these results to the text, one could speculate that the terms ‘mind and heart,’ which connote interiority, can only be manifested in outward shows of emotion (‘tears,’ ‘grief’) therefore their apprehension is always somewhat dubious—’seemed.’

 

 

B4)      ‘Door room open opened light’ confirms the Gothic’s obsession with spatiality and the sense of unfolding rooms, a motif which lends itself, in my opinion, to the opening up (ha ha) of a surface/depth discourse.

 

 

B5)      ‘dear see young good man’ does lend itself, although this is a bit of a stretch, to the argument about the limited nature of character development in the Gothic (‘young’ and ‘good’ aren’t exactly bursting with psychological complexity). This is a tenuous argument to make, so I’ll stick with the use of the word ‘see,’ and its confirmation of the importance of ocular description in the identification and semantic construction of character.

 

 

 

V. Conclusions

 

            It would be madness to claim that woodchipper has ‘proved’ Sedgwick’s psychoanalytic reading of gothic fiction, and it is quite clear that the noted corollaries do require a fair bit of ‘massaging’ to fit the theory to the empirical data. That being said, I do think some aspect of her argument may be supported by a good textual woodchipping. In particular, the linkage of the visual to establishment of character appears reasonably solid. Again, these corollaries are not self-evident but needs be interpreted by the experimenter—and I, personally, would not have it any other way. I would have to be better acquainted with woodchipper’s heuristic processing to feel confident in the results of this experiment but, to address Neil’s concern over the chasm between high theory and digital humanities, for me this is a promising—or at the very least provocative—start.

*

 

Now that I’ve made my point, I have a confession to make. The data have changed since I ran this experiment four weeks ago.  I constructed an analysis based on the word associations generated within certain texts; I was under the impression that they were generated from the selected text, but was prepared for their emergence from a standardized ‘word bank’ culled at some point from the other texts; it would be the thematic association among gothic works, then, and this would be just as good, but more difficult for accurate interpretation. What I did not expect, however, was that these values were variable based on the texts in the machine; texts are not taken in isolation, or so I must assume. In comparison with the original data used for my analysis, the differences are not catastrophic (see screenshot below for a comparision), and I believe the key arguments of my interpretation still work. However, it is a troubling concern, and if nothing else a stirring reminder of the risks involved in a methodology that is increasingly data-driven.

 ImageImage

 

 

Some Curmudgeonly Ramblings on Graphs, Maps and Trees

            I recently had the pleasure of reading Franko Moretti’s Graphs, Maps, and Trees: Abstract Models for a Literary History and, although it is a gross mistake to equate any one particular work with the totality of the movement it represents, I do feel his book highlights much of the promise—and many of the risks—of the data-driven impulse within the Digital Humanities. Given the title and structure of the work, I think it best to tackle each interpretative device, graph, map, and tree; separately.

 

 

The graphs portion of Moretti’s work is the most conventional, although even here his methods are bound to produce controversy. Rather than undertaking an exclusive reading of a particular text or specific set of texts, Moretti analyses the publishing history of all works produced during particular historical intervals. His approach is egalitarian by design, seeking to understand not the Canon, if such an apparition still exists, but the whims and fancies of the book buying public. The move is not an unwelcome one, and goes a long way towards dismantling the conception of literary scholarship as focused exclusively upon the crème of the literary crop. Moretti’s conclusion is that literary movements move in cycles coterminous with the rise and fall of discrete generational passings; as the generation shifts, the literary tastes change. Many have pointed out the problems with this approach—many of them concerned with the unreliability of the data, or accusations that Moretti’s data are pre-determined by the results. However, my concern is broader: I worry Moretti leaves out the most essential aspect of the cultural-literary work interrelation. Yes, it is certainly true that literary works respond to their cultural climate—but one need only look to the scandal following suicides imitating The Sorrows of Young Werther to see that literary moirés have a reciprocal influence on culture. Personally, I find literary works fascinating for their mediation of the cultural climate. Obliviously, no one work can fully exemplify all the social variables operative at the time of its construction. Yet the cultural currents it does inscribe and the manner in which they are inscribed into plot, character, or narratological structures speaks volumes. In other words, Moretti’s approach is a fascinating first step—but I feel that first step requires the additional leap of ‘drilling down’ into specific literary texts to examine the trends found at the macroscopic level.

            In the Trees section, Moretti implies a Darwinian, genealogical reading of the appearance of ‘clues’ within the rise of the mystery genre, ‘survival of the fittest’ here referring to the success of those works that correctly divine and exploit the public’s tastes. The method is intriguing inasmuch as it views texts not as individual occurrences but as interconnected phenomena within a larger cultural network; however, I see the same problems as with the Graphs segment. Moretti’s approach assumes that literature merely responds to rather than creates its cultural milieu—more troubling, the approach assumes that cultural tastes are static. In other words, Moretti reads the ‘genealogy’ of the clue is teleological—the literary culture producing iteration after iteration until it finally finds the right approach for its public. But what if public tastes changed in tandem with the evolution of the genre? Personally I believe, and I think Darwin is on my side here, that evolution provides only problems, not solutions. That is, an eye exists in response to a particular problem, that of the organism acquiring and interpreting visual data. Moretti’s conclusions about the evolution of ‘clues,’ then, should be read as literature’s response to a cultural preoccupation, not to have a particular type of clue, but rather a concern with criminology in general. The scope of this post forbids me from elaborating this further, but I do believe there is a connection between the detective who reconstructs events from emperical evidence and the Victorian concern with statistics, surveillance, and a society ruled by science and data.

 

Lest it be said that I am overly harsh towards GM&T, I hasted to add that its section on Maps is to my mind brilliant and endowed with limitless potential. Moretti provides various topographic ‘maps’ of the physical landscape presented by the literary text—but he does so in an attempt to understand the author’s spatial conceptions and the socio-economic impetus behind them. There have been concerns raised over the verisimilitude of Moretti’s judgments here—chief among them, his charting of textual places in a circular rather than a linear arrangement. Whether or not the concerns are justified, what matters is Moretti’s reconceptualization of how literary analysis should work. Texts have been tied to socioeconomic factors before—New Historicism is practically dedicated to this endeavor. However, Moretti’s approach looks not to individual textual quotations but to the structure of the text as it might have appeared within the author’s mind. It goes without saying we are never going to break into the author’s imagination; however, we can and should try to map the author’s conceptualization of the textual events as if they were real events, and then understand how these events are rendered within the semiotics and narrative. This presents the possibility of looking not to the text for what was going on in the ‘real world’—that is easy enough to discern by cracking open a relevant history book—but rather the mechanisms by which these forces were inscribed into the text via the author’s own mediation.

 

On the whole, Moretti’s attempt to bridge the gap between data-driven methodologies and the Humanities is promising, but I feel it overemphasizes the data to the point of eliding the more abstract interpretations required in close-readings of texts. The act of interpretation is still present, but it seems sidelined—consciously or unconsciously—by the authoritative verisimilitude the data is supposed to purvey. This is a great concern for me—the promise of literature, in this critic’s humble opinion, is the ability to look beyond the horizon of prevailing ideologies, problematize the unquestioned status quo, provide a space for self-reflection and cultural criticism. I must echo Adorno and Horkheimer here—the empirical, data-driven approach is the reigning ideology of our day, and although its benefits are legion, the unquestioning belief in data as the answer to all problems calls to mind Ginsburg’s Terror through the walls, and crack of doom on the hydrogen jukebox.

 

 

And collaboration would be…what, exactly?

I decided to encode for the Shelley-Godwin archive because I didn’t know XML. I had never encoded anything in any language. I thought it was time for that to change. I certainly learned how XML works. I learned to use Oxygen. I now have a GitHub account. But I also joined the project because I thought I would discover something about Frankenstein, and that I think I did not do.
Learning XML was an enlightening experience all on its own. For the first time, I understood why these are called languages. I felt myself translating from English to XML. “This goes up here” became <add place=superlinear>this bit</add>, and I felt suddenly powerful. But this learning experience prevented me from seeing what I was encoding. I have heard it said that for coding is an act of interpretation, and it absolutely is—but I thought that I would leave with a new perspective on Frankenstein, and I didn’t. I was so busy learning the code that I didn’t have much attention left to reflect on what I was interpreting. I fear I didn’t give Frankenstein the attention and focus it deserved. I was, of course, working from wonderful, pre-existing resources—an image of the manuscript, and a transcription that was praised to the skies by everyone who knew more about Shelley than I did. That gave me an excuse to focus less on the words and more on the translation: I couldn’t screw it up, as long as I listened to Charles Robinson, right? I don’t think I made any blatant mistakes, but neither did I gain any insight into what I was encoding. If I had continued in the same style, would I have reached a point where I would have made a mistake, simply because I hadn’t been paying enough attention to the content? Or would I have learned XML well enough to stop thinking about grammar and resume thinking about content? Encoding is an act of interpretation, but it doesn’t necessarily lead to a new overall interpretation—and I can imagine that becoming a problem.
However, perhaps there is a new interpretation created that the low-level encoder doesn’t get to see. What was happening as the schema for this project was built and altered? I had no input on that discussion, of course; I barely knew it was going on, and I didn’t get to listen to it. But I am sure that is the level where interpretation happens. I’m curious about it. For me, however, encoding was simply an act of word-to-word translation. Not much thought was required. It was the first time I used XML; it was a learning experience; that’s ok. But isn’t every new project is a learning experience? Perhaps I should have learned more—but there’s only so much you can learn when you are only supposed to encode two pages. There was a sharp limit to this learning opportunity.
This project was also supposed to be an exercise in collaboration. Thank goodness: I might have managed to teach myself some XML, but I never would have managed to teach my self GitHub. I still don’t understand GitHub—which tells you exactly how much attention I was paying. I found myself relying heavily on other people’s expertise, which is unfortunately easy to do in a group project when one isn’t the leader—and even easier when one has no experience in this sort of project. Others on my team had used XML before; I took them all my questions, but since I was so new to this work, all my questions were elementary. We never reached a level where we could ask deeper, more important questions about our work.
I benefitted from their expertise, but that’s a nice way of saying I mooched. The proof of that is that, while my group leader knows how to push a page on GitHub, I still don’t remember what “push” means. It’s easy for me to excuse myself: I was starting from absolute zero. But I’m still trying to determine whether mooching counts as collaboration. I certainly learned from my peers. I learned everything from my peers! If I had tried to encode a page or two alone in my room, I would have given up after I had to get into Terminal (which is a surprisingly heady experience). It is also true that encoding in groups was fun! At first I didn’t see the point of sitting with a group of people, looking at computers, and not talking. But then I found I had questions, and I immediately had people to answer my questions, or laugh at my questions, or both. I have always been a loner when it comes to work, but this changed my mind for at least some projects.
So what is collaboration, in the end? Does collaboration mean working together from start to finish on a project? Were the collaborators the people who were developing the schema? (It seems to be that that would be the most interpretive role to take for any project: that is the director of the play, while a simple encoder has less control than the props manager.) Were the collaborators the team leaders, who made sure that everyone else in the project communicated and received communications, who mastered every aspect of the project so they could teach it to me and the other clueless encoders? Or were we all collaborating, even if we contributed vastly different things? After all, even though I had no impact on the course of the project, I still encoded two pages all by myself; and perhaps if I had encoded five, I would have discovered something with broader implications and I would have asked larger questions.
In the end, I haven’t made up my mind. Did I accomplish something large that will advance scholarship? Not me personally, certainly—but what if I encoded two pages of an archive? I had a hand in something that will advance scholarship, even if it’s a tiny hand. That’s not much. It’s not enough to get credit for, and it’s not enough to make me feel good about myself; but it is enough to say that I have experience with XML now. This project was the start of something new. Now I have a skill to contribute to my next project.
Perhaps collaboration is different for every person in every project. Not everyone needs to contribute the same amount of intellectual effort or time or responsibility; but everyone puts something in, and everyone takes a little something away. It does, of course, make assigning credit a hassle. But I’m glad that opportunities like this exist, because no one had to sit in a corner being the XML kid. Opportunities to collaborate even briefly on all sorts of different projects lets us learn new skills while we contribute to the world of scholarship, and I think these opportunities could create a generation of extraordinarily well-rounded scholars. Now that I have worked on this archive, I can’t say: “Look what I have done!”—because I didn’t do very much. But I can say: “Look what I can do!”

Collaboration and Encoding for the Shelley-Godwin Archive

In the preface to the 1831 edition of Frankenstein, Mary Shelley describes the origins of her novel in a “wet, uncongenial summer” at the Villa Diodati.  There, living with Percy Shelley, Byron, Polidori, and an unacknowledged Claire Clairmont, she found inspiration in “many a walk [and] many a drive” with her husband, and in the “many and long … conversations” she overheard (and, let’s be honest, probably participated in) between Lord Byron and Shelley.  The story of Frankenstein, she suggests in this preface, is less the result of her own genius than it is the happy outcome of spending time with and working beside other talented people. 

Regardless of how true this story actually is, I couldn’t help but think fondly of it as we launched our class encoding project for the Shelley-Godwin Archive.  Not only were we translating into xml the very manuscript that Mary Shelley had been producing during that long, rainy season, but we were doing so in an environment just as predicated on support, encouragement, and—most importantly—collaboration as the one that Mary Shelley tells of finding at the Villa Diodati. 

In carrying out this project, we teamed up not only with each other, but also with a number of students from Neil Fraistat’s Technoromanticism course at the University of Maryland.  (Those students have written their own excellent responses to the encoding experience on their Team Markup blog.)  The project provided us not only with an irreplaceable opportunity to study the nuances of Mary Shelley’s manuscript—to see her doodles and cross-outs, to discover Percy Shelley’s  revisions, and to witness the creation of one of the most famous creation narratives of all time—but also to invent for ourselves a successful system of teamwork and support.  As we look to continue this project in the future, I thought I’d share some of the most important lessons this particular partnership has taught me.  

1. Ask for help when you need it.  When I first volunteered to coordinate the encoding project, I assumed, with cringe-inducing naiveté, that I’d figure it out as I went along.  I knew html, I reasoned, and xml didn’t seem that different.  How difficult could it be? 

I got my answer almost immediately.  It could be very difficult.  What was this “Github” and how did I set it up?  How was I supposed to renew my subscription to Oxygen and why wasn’t it opening properly on my computer?  Where were the transcription files that we were supposed to encode?  And where—and what—was the schema?  It didn’t help that I was supposed to lead a team of three other students, many of whom knew little more than I did. 

It became clear to me frighteningly quickly that I was going to need help.  Lots of help.  I began questioning everyone I knew (or didn’t know) who seemed to be comfortable with computers, from the Scholar’s Lab staff and the current NINES fellows to friends of friends and tech support.  I checked the project GoogleDoc, read and reread the Encoding Guidelines, and sent a ridiculous number of emails to Amanda, UMD’s own, amazing project coordinator.  I read internet how-to pages until I felt that I knew them by heart. 

And then something amazing happened: I actually started to understand what I was doing.  Not always completely and not always terribly well, but enough so that I could mark up a page in Oxygen and push it on GitHub without breaking a sweat.  More important, I found that I, too, could offer people help when they needed it.  I was still far from an expert, but my willingness to look like an idiot had allowed me to become a successful—and, dare I say it?—helpful part of a collaborative community. 

2.  Work together as much as you can.  It turns out that a successful markup team, much like a successful sports team, must be able to work together.  As my little league coach used to remind us, it doesn’t matter how good you all are individually if you can’t play well as a group.  In the case of my little league, this was a moot point—individually or together, our athletic skills were basically nil.  In the case of the SGA encoding project, however, teamwork proved to be key. 

At UVa, we accomplished this by doing most of our markup work together.  Because our group size and our encoding requirements were both relatively small, we were able to arrange meetings throughout the semester to do our work together.  This meant that if questions or problems arose, we had teammates present for immediate feedback and support.  If someone faced technological difficulties—as happened frequently during the early days of our project—we could work together to engineer a work-around.  At the very least, we had someone to talk to about Frankenstein’s poor decision-making and our dislike of ptr tags. 

When it came to working with our counterparts at Maryland, however, the process was more complicated.  I stayed in touch with their project coordinator frequently by email, and she was able to provide us with key information about everything from their team’s progress to unanticipated schema changes.  We also engaged with the UMD students through the project GoogleDoc, which contained a helpful list of questions and answers to some of the more frequently-encountered concerns.  Perhaps most helpful of all was the UMD team’s visit to UVa partway through the project.  Being able to meet and to speak face-to-face provided an invaluable opportunity not only to get feedback on our work, but to get a sense of the larger group behind the project, of the true scope of our team. 

The lessons from this process were simple and striking: work together as much as possible, in person if you can and with frequent correspondence if you’re at a distance.  There is no substitute for communication and interaction. 

3.  Be flexible.  Schedules fluctuate.  Schema change.  Part of working with a large group of people means being able and willing to adjust your own schedule accordingly. 

I’ll be the first to admit that this wasn’t always easy for me.  My initial reaction to alterations to the schema was frustration, and my first response to technological gaffes, annoyance.  Reworking my schedule to allow time to figure out how to install GitHub was a challenge.  Actually installing it just hours before I had to teach the process to other people was an even greater one. 

I quickly learned, however, that, inconvenient as such adjustments initially seemed, they often held a significant payoff in the end.  Revising my encoded pages after schema changes, for example, allowed me the opportunity to go over each page again, making slight adjustments to the rest of my xml and reconsidering the decisions I had made about the document as a whole.  The technical glitches that appeared in GitHub and Oxygen were time-consuming to solve, but they also taught me a great deal about set-up and troubleshooting that I would not otherwise have learned.  Such information allowed me to be far more helpful to my team; the problems they encountered were, almost invariably, the same ones I had faced only hours before, so it was easy for me to remedy them. 

I will be the first to admit that there were more than a few areas in which we could have improved.  The structure of our project, for instance, was a work in progress from the beginning, as we worked to adapt UMD’s model to fit our course requirements.  The demands of learning xml also kept us working relatively slowly, and had the project continued for a longer period of time, we would certainly have been able to encode a greater number of pages at a faster pace. 

The spirit of the Villa Diodati, however, stayed with us.  We saw it frequently on the pages of the manuscript we encoded, as when Percy Shelley revised Mary Shelley’s writing or Mary Shelley took notes on an envelope addressed by William Godwin.  We saw it in the SGA schema, too, as input among the team at MITH led to the development of new tags or more nuanced markup.  And we saw it in our team of encoders which, despite markup frustrations, technological failures, and a hundred miles of distance, managed to create a successful and fruitful collaboration.

Academia on the Internet: Online Courses and Digital Books

If you haven’t already seen it, David Brooks published an op-ed about online education in the New York Times yesterday.  In it, he writes about the growing number of major universities — among them Stanford, Princeton, Michigan, Penn State, Harvard, and MIT — now offering a significant number of online courses.  These courses, which generally include video lessons and embedded quizzes, are open to everyone around the world. They bring world-class education to those without access to these universities and allow teachers to reach beyond the classroom (and certainly beyond the usual class capacity — a recent Stanford course on artificial intelligence attracted 160,000 students) to make a major impact with their lessons.  The flip side, however, is a spate of concerns about these classes’ intellectual rigor, the inaccessibility of professors to students, and the implications such a change might have on the academic profession.  (Hint: the job market may just have gotten tougher.) 

We have not talked the issue of online courses much in our class, but it seems to me that it does present strong ties to the more frequently-discussed issues of digitization and the possible ramifications of changing media.  How will online courses affect the way we construct and teach classes, the way that students engage with materials, and the way that we, as teachers, interact with students?  What will it mean for job prospects and teaching opportunities?  Will—in some sort of horrible, THX 1138-like future — the university be streamlined to a small coterie of professors who deliver lectures to students hundreds, even thousands of miles away from the classroom?

This issue came up in a round-about way at the talk Chad Wellmon and Chris Forster gave back in February.  There, the conversation briefly turned to the issue of videotaping lectures, and some of the same questions that I have just raised came up: what will this mean for the profession?  Will we even need professors anymore?  Will students need to come to class if they can just watch the lecture in their bedrooms? 

The answer that Wellmon, Forster, and their audience quickly came to was no.  Lectures — good lectures — will not hold up indefinitely on videotape.  Professors change them over time, adding in new research or thought (and, in the best cases, taking out outdated jokes).  When delivered in person, lectures can also engage an audience in a way that a recording, however well done, generally cannot. 

I tend to agree with this assessment.  Neither videotaping classes nor a larger program of online courses seems to me to be a major threat to the traditional classroom.  As Brooks himself points out, there is no substitute for human interaction.  Students who can engage in-person with a professor and with a room full of dynamic classmates must be more involved than those at a remove from the classroom energy.  The more advanced parts of learning, what Brooks calls “reflecting” and “synthesis,” are also better suited to in-person education.  What’s more — although Brooks neglects to point it out — basic elitism will probably prevent online education from taking over entirely.  Classes accessible to anyone cannot be exclusive and, thus far at least, one cannot get a degree from Harvard simply by taking a free online course. 

Elements of this discussion also seem to me to hark back to our earlier conversations (in class and on this blog) about the digitization of books.  Some of the same features, after all, apply to both subjects: digital books, like online courses, are generally more readily-available and easily-accessible than printed books or in-person classes.  Many people, too, would argue that they offer just as good a product as the “real thing.”  In reading a digital text or taking an online class, however, one loses something experiential — gone is the romance of the medium and a sense of the context in which this book (or class) fits into a larger history.  Gone, too, is a sense of connection: in reading, a tactile one with the book’s pages; in online coursework, with the professor, the class, and the larger university community. 

If we accept this parallel, then Brooks’ vision of a “blended” classroom — one that employs both online and face-to-face educational methods — suggests that printed and digital books, too, will be able to find a happy harmony.  After all, as we have said all semester, the two are not mutually exclusive.  The trick to will be to “blend” properly and to make sure that we preserve the best features of each medium.

A thing of beauty is a joy forever

I know our class is over, but since I still owe at least half a blog post I thought I could share a few observations about our final course topic: the university library. In a way this is really just an extension of the comments Christina and Eliza exchanged regarding Lingerr’s most recent post, so forgive me for indulging myself.

Flavorwire originally published this piece–a photoessay of sorts–in December 2011. I remember seeing it then and showing it to others, not just for the photographs but also for the comments left by readers (which are fantastic, by the way, and worth perusal). Apparently the post received enough attention to be worthy of republication in January as one of Flavorwire’s “most popular features of the year.” Unsurprisingly (?), the very brief paragraph that introduces the photographs covers some of the same ground we did in our in-class discussions about the function of the academic library:

The college library, whether ornate or modern, digital or dusty, is in many ways the epicenter of the college experience — at least for some students. It is at once a shining emblem of vast, acquirable knowledge, a place for deep discussions and meetings of the mind, and of course, a big building full of books, which, as far as we’re concerned, is exciting enough. 

“At least for some students.” It is the strangest phrase in the post. The author reveals, mid aesthetic revel… what? A kind of discomfort? She never returns to this question about which students do and which do not find the library to be “the epicenter of the college experience,” so why include the qualification at all? As I noted above, and as you can see if you click through, the post is all about aesthetics: thirty seven photographs (some libraries merited more than one) but fewer than two hundred words. Of course the title of the post is “The 25 Most Beautiful College Libraries in the World,” so this is hardly surprising. But the excerpt I quote above sits uneasily with the photoessay and not only because we are left wondering about those unfortunate students who, sadly, don’t frequent the library. Quite a number of commenters noticed the tension here too, which arises I think from the claim that the academic library can be (and is) at once an “emblem of… knowledge,”a “place for deep discussions,” and a “big building full of books.”

As some of the commenters point out, the slideshow presents pictures neither of library stacks, nor, for the most part, of building architecture. These are almost exclusively photographs of reading rooms–and largely photographs of empty reading rooms. Hardly surprising, once again, as the essay purports to show us the “most beautiful” academic libraries in the world. This brings me back to the author’s claim: are these photographs supposed to illustrate libraries en toto–those buildings which are simultaneously emblems of knowledge, places for discussion, and repositories for books? These lovely reading rooms seem to me more like anti-libraries, at least as far as the aforementioned definition goes. Perhaps it’s a naming problem, and if the blog post had been titled “The 25 Most Beautiful College Library Reading Rooms” I would feel less uneasy about what, exactly, we’re supposed to take away from the photos. That is, if I do feel uneasy. I’m not sure what I feel, to be honest. But maybe it has something to do with that parallel Professor Stauffer drew for us between libraries and churches. Maybe reading rooms are like little chapels: after all, one goes there to read and reflect and to feel inspired. Cathedral architecture has always been designed to draw the mind and heart up towards God—to awe and exhilarate. Does the aesthetically pleasing reading room (or library) perform a similar albeit secular function? And what does this have to do with the future of the academic library?

Re: Library Treasures, Rebooted

For some bizarre reason I can’t comment on Eliza’s post. I’m assuming I’m doing something dumb, but decided to just post this on its own instead:

I also really enjoyed that article, Eliza. I was pleased they did not just point out the problem, explain why it is a problem, and then end their article (like the depressing ones we’ve read earlier do).

I was also struck by a thread I found running through most of the articles on libraries we read for class today- thinking about libraries as community-oriented, and in terms of their physical space. Almost like books, libraries to me have always represented abstract information I can take, but after reading the articles I thought about what it feels like to be in one. In Alderman, I know I will get work done, but I also know that I will see friends, drink copious cups of coffee, etc. It is not just gleaning information. Even in terms of research- before I really knew how to use online sites like Jstor or even the Carleton library site, I would find where critical books were, and physically browse them. It seems sad to me that we could lose that if books are all kept elsewhere, though I do agree with embracing the physical space and communal significance of a library by removing some of its books.

To change topics entirely, I’m interested in how we could make our books an engaging, interactive experience. The BL’s site has an excellent introduction to _Alice’s Adventures under Ground_, as well as a virtual version, and even an audio version.

The text’s transcription is set underneath the images from the original manuscript. We obviously could not do all of this. So how can we make our objects as interesting on a smaller scale?

I think one of our big selling points with these books is how intensely personal some of them are. Though as English grad students we are probably more book-loving than most, the personalities that emerge when we look at the marginalia would (I hope) be interesting to everyone; I’ve told my friends about books I’ve found, and even those in science thought that the notes/drawings/letters were exciting (no offense to the scientists of course, I’m just using them as an example). I think we need to categorize the works no matter what (we just have too many), and I think a way to do this could be to prettily display different groups of what we’ve found. So you could click on drawings, and get to all of them, etc, etc. I’d love to hear you guys’ feedback on this, especially since we’ll all potentially be working on this project soon!