Machine clustering and “The Flowing Light of the Godhead”

In Chapter 11 of Text Analysis with R for Students of Literature, Matthew Jockers introduces machine clustering through an authorship attribution problem. That is, by comparing different usages of high frequency words in an ingested corpus and calculating the Euclidean metric between them, the distance between different books in the corpus can be determined. Books closer together in the metric will have similar linguistic features and books that are further away from each other will be dissimilar. Assuming that books written by the same author will manifest the similar linguistic features, authorship attribution can be determined by the proximity of an unknown text to a grouping of known texts.

Jockers admits in the next chapter that clustering is not ideally suited to authorship attribution problems and that it “is often employed in situations in which a researcher wishes to explore the data and see if there are naturally forming clusters” (119). When there are a set of closed classes or authors, he explains, supervised classification is a better approach. That being said, I was interested to see what data machine clustering would provide when applied to Book 1 of “Das fließende Licht der Gottheit.” The text, which consists of 46 chapters of varying lengths and one prologue, was originally written in a Middle Low German dialect and apparently translated in the mid-fourteenth century by Henry of Nördlingen into the Alamannic dialect of Middle High German. Assuming that a translator superimposes his or her own linguistic style to a text, an idea that I am still trying to corroborate with research, I assume that there is only one author of Book 1 of “Das fließende Licht der Gottheit” and that no other scribes were involved in the translation process. The process of machine clustering the text, therefore, becomes more an exercise in exploring potential connections between chapters of the book than determining authorship attribution.

Clustering the chapters in Book 1 of “Das fließende Licht der Gottheit” with a frequency threshold set to 0.25 (or a quarter of one percent of the most frequently used words) catches 60 words in the corpus. These words include: an, ane, da, das, der, die, dis, ein, es, got, han, hat, herre, herzen, ich, in, mag, mich, miner, nut, so, und, wie, dem, des, dingen, ir, ist, je, min, minne, mir, mit, sele, sere, si, sich, sint, aller, als, bist, den, denne, dich, din, dir, du, er, hast, nit, sin, von, was, diner, o, muesse, vier, lust, solt, minnen. These words can generally be grouped into prepositions (e.g., ane, mit) conjunctions (e.g., und, denne), personal pronouns (e.g., ich, mich), coordinating conjunctions (e.g., was, das), definite articles (e.g., der, die), negative particles (e.g., nit), modal verbs (e.g., solt, muesse), possessive adjectives (e.g., min, diner), reflexive pronouns (e.g., sich), variations on the verb “to have” (e.g., han, hat). Also prevalent are words related to the themes of “love” and “desire” (e.g., herzen, minnen, lust) and theological words (e.g., got, sele). In sum, the potential connections between the chapters are determined primarily based on stylistic features present in the chapters, although theological and erotic vocabulary will also influence how the chapters get clustered. This clustering produces the dendrogram:

dfl-book-1-dendrogram

Cluster dendrogram of Book 1 of “The Flowing Light of the Godhead” (frequency threshold 0.25)

There are seven general clusters of mostly bifolious clades in the dendrogram (from left to right), to which I have added extra detail by including the chapter titles in parentheses:

Cluster 1:
Chapter 12 (The soul praises God in five things)
Chapter 17 (The soul praises God in five things)
Chapter 11 (Four battle for God)
Chapter 8 (The most lowly praise God in ten things)
Chapter 18 (God likens the soul to five things)

Cluster 2:
Chapter 30 (The seven hours)
Chapter 16 (God likens the soul to four things)
Chapter 34 (In suffering you should be a lamb, a turtledove, and a bride)
Chapter 19 (God caresses the soul in six ways)
Chapter 20 (The soul praises God in return in six ways)

Cluster 3:
Chapter 40 (She replies: Something that is better than seven things)
Chapter 24 (How God responds to the soul)
Chapter 6 (The nine choirs: How they sing)
Chapter 41 (God asks in praise what the precious stone is called)

One simplicifolious: Chapter 7 (God’s curse in eight things)

Cluster 4:
Chapter 43 (Put your delight into the trinity)
Chapter 35 (The desert has twelve things)
Chapter 36 (Concerning malice, good works, and concerning a marvel)
Chapter 23 (You should ask that God love you passionately, often, and long; then you shall become pure, beautiful, and holy)
Chapter 32 (You should ignore honors, suffering, and possessions. Be sad after sinning)

Cluster 5:
Chapter 33 (Concerning food, consolation, and love)
Chapter 46 (The diverse adornments of the bride; how she comes to her bridegroom; and how the retinue is composed, which is ninefold)
Chapter 9 (With three things you dwell on the heights)
Chapter 29 (The beauty of the bridegroom and how the bride should follow him in twenty-three steps of the cross)
Chapter 38 (God boasts that the soul has overcome four things)
Chapter 4 (The soul’s journey to court during which God reveals himself)
Chapter 5 (The torment and the praise of the soul)
Chapter 22 (St. Mary’s message on how one virtue follows another. How the soul was made in the jubilus of the trinity, and how St. Mary nursed all the saints and nurses them still)
Chapter 2 (Concerning three persons and three gifts)
Chapter 44 (The sevenfold path of love, the three garments of the bride, and the dance)
Chapter 26 (The path upon which the soul draws the senses and is free of suffering of the heart)
Chapter 10 (Who loves God triumphs over three things)
Chapter 25 (The way to suffer pain willingly for God’s sake)
Chapter 0: Prologue
Chapter 1 (How Love and the Queen spoke to each other)
Chapter 27 (How you become worthy of this path and keep to it and become perfect)
Chapter 3 (The handmaids of the soul and the blow of love)
Chapter 28 (Love shall be deadly, boundless, and unceasing; This is the folly of fools

Cluster 6:
Chapter 21 (Of knowledge and enjoyment)
Chapter 45 (The eight days in which what the prophets longed for was accomplished)

Cluster 7:
Chapter 39 (God asks the soul what she is bringing)
Chapter 15 (How God receives the soul)
Chapter 14 (How the soul receives and praises God)
Chapter 31 (You should ignore scorn)
Chapter 13 (How God comes to the soul)
Chapter 37 (The soul responds to God saying she is unworthy of these favors)
Chapter 42 (The precious stone is called heart’s delight)

It was interesting to see how surprisingly accurate the machine clustering of the corpus was. Some of this clustering made sense on account of the proximity of the chapters to each other. For example, Chapters 19 and 20 (in Cluster 2), a call and response typical of Minnesang with allusions to the Song of Songs, were grouped closely together on account of their similar stylistic structure that is prevalent in both chapters: “Du bist min … ” (“You are my … ). It was interesting to see that the clustering also grouped other stylistically similar chapters together this these. For example, Chapter 34 also manifested the “du bist min …” structure very strongly. These stylistic similarities began to fade, however, the further one moves away from the root bifolious. Chapter 16 retains the use of the initial “Du” in its sentence structures, but varies the verb in the second position with verbs describing the nature of the soul (e.g., smekest, ruchest, luhtest). This chapter offers varying conclusions to the sentences in the form of a comparison: “als ein” (“like a”). Chapter 30 picks up on the “ein” (“a”) that is found in these comparisons and completes them with predicate nouns. For example: “ein suesse swere” (“a sweet heaviness”). All of the sentences in this chapter omit the verb in the second position, an understood “ist.” If you want to look at these stylistic similarities yourself, you can grab PDF of the Morel edition on Wikimedia Commons.

So what does this all mean? On a stylistic level it is interesting to see how machine clustering groups text together based on language similarity. Although in this instance it is probably not realistic to assume that this approach would uncover any new information regarding authorship attribution, I see it playing a role in discovering how Mechthild’s mystical language evolves over time. For example, will call and response structures in the remaining books of the work vary dramatically from those in Book 1 and, if yes, what elements of the structure change and why? What do these changes imply regarding the relationship between God and the soul? In a way, then, reading a text by means of machine clustering helps to sort out the text and discover relationships that the human eye may not necessarily discover on the first or even second pass. When dealing with larger corpora, however, machine clustering may help to discover connections between texts written by unknown authors. This naturally can be leveraged quite usefully, I think, in the field of medieval studies to analyze unknown manuscripts and determine their relationships to other texts in the corpus. The problem remains, however, that many of these texts have not been transcribed and are not yet machine readable. There is, therefore, still a lot of fields to be plowed and treasure to be found.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s