Table of Contents
- Course Details
- Course Structure
- Getting Started
- Extra Resources
- Week 1, 2020-09-09: Course Intro.
- Week 2, 2020-09-16: Python Basics
- Week 3, 2020-09-23: Text Analysis Basics
- Week 4, 2020-09-30: Working with Words
- Week 5, 2020-10-07: Word Frequencies
- Week 6 2020-10-14: Lexical Techniques
- Week 7 2020-10-21: Syntactic Analysis
- Week 8 2020-10-28: Corpus Linguistics
- Week 9 2020-11-04: Corpora Cont’d
- Week 10: 2020-11-11: Probabilistic Approaches
- Week 11 2020-11-18: Syntactic Analysis II
- Week 12 2020-12-02: Graph (Network) Analysis
- Week 13 4.1 2020-12-09: Final Project Presentations, and Wrap-Up
- Finals Week: 12-16
Welcome! Here you’ll find all the course information for Introduction to Computational Literary Analysis, a course taught at Columbia University in Fall 2020. Please read this syllabus completely.
- ENCL UN3612: Introduction to Computational Literary Analysis
- Department of English and Comparative Literature, Columbia University
- Instructor: Jonathan Reeve
- Discussion sections: 13:00–14:00, New York City time, in the course chatroom on Zulip
- Lab time: Fridays, 14:00-15:00, in live video chat on Jitsi
- Email address: email@example.com
- Although please message me on Zulip instead
- Classroom/chatroom on Zulip
- Course website and course readings
- Course repository
This course is an introduction to computational literary analysis, which presumes no background in programming or computer science. We will cover many of the topics of an introductory course in natural language processing or computational linguistics, but center our inquiries around literary critical questions. We will attempt to answer questions such as:
- What are the characteristic speech patterns of the narrators in Wilkie Collins’s The Moonstone?
- What words are most frequently used to describe Katherine Mansfield’s female characters?
- Which novels of the nineteenth century are the most similar to each other? Which are the most different?
The course will teach techniques of text analysis using the Python programming language. Special topics to be covered include authorship detection (stylometry), topic modeling, and word embeddings. Literary works to be read and analyzed will be Wilkie Collins’s The Moonstone, Katherine Mansfield’s The Garden Party and Other Stories, and James Joyce’s Dubliners.
Although this course is focused on the analysis of literature, and British literature in particular, the skills you will learn may be used to computationally analyze any text. These are skills transferable to other areas of the digital humanities, as well as computational linguistics, computational social science, and the computer science field of natural language processing. There are also potential applications across the humanistic disciplines—history, philosophy, art history, and cinema studies, to name a few. Furthermore, text- and data-analysis skills are widely desired in today’s world. Companies like Google and Facebook, for instance, need ways to teach computers to understand their users’ search queries, documents, and sometimes books. The techniques taught in this course help computers and humans to understand language, culture, and human interactions. This deepens our understanding of literature, of our fellow humans, and the world around us.
This course presumes no prior knowledge of programming, computer science, or quantitative disciplines. Those with programming experience, however, won’t find this boring: the level of specialization is such that only the first few weeks cover the basics.
Although this is usually a classroom-taught course, due to the global pandemic, this course is taught online-only this year. This will require a lot of adaptation from everyone, and it won’t be easy. That said, I’ll be trying my best to make this course flexible, and doable from different timezones.
In place of in-person lectures, I’ll post lecture videos, every Wednesday, or earlier. Each video is between 30-70 minutes long, and is required viewing. Please watch the lecture videos before coming to discussion sections, so that we can all discuss it synchronously. Links will be posted to this syllabus. Please resist the urge to watch lecture videos in advance, since they may change as I revise the course content.
In place of in-person classroom dialogue and activities, we’ll hold discussion sections online, on this Zulip server, every Wednesday during class time, from 13:00–14:00. Zulip is a text-based chat platform, with email-like threading. You can use it to join an existing discussion thread, or create a new one. Please familiarize yourself with Zulip ahead of our first meetings.
Attendance in these discussions is required. If you need to participate asynchronously one week, for whatever reason, just let me know in advance (on Zulip).
As in a traditional classroom, some days you will want to speak (i.e., write in the chatroom) more than others, and that’s fine. But please say something thoughtful at least once per class. This way there is a record of your participation.
Feel free to chime in on the course chat throughout the week, with any questions or comments you might have. I’ll usually be there once every couple of days. Please use the public channels for any course-related questions you have, unless they are of a private nature (e.g., grades), in which case please message me privately on Zulip, as I will answer faster there than through email. Discussion about specific textual passages might be better placed in annotations, in the margins of the text, using our annotation platform. See Annotations, below.
These are synchronous videoconferences that happen every week, on Friday, from 14:00–15:00, here on Jitsi. They are less formal than the discussion sections, and an ideal place to come and chat about the readings and/or programming assignments in real time. I recommended you attempt the homework assignments before coming, so that you can ask any questions you have about them during the lab. You’re also welcome to join and just quietly work for the hour. I won’t take attendance, but these labs are strongly recommended.
To get set up for this course, you will need:
- Access to a computer that runs Linux, MacOS, or Windows.
- An Internet connection. I’ve tried my best to make our course software work as globally as possible, but if you’re attending class remotely, from a country that has restricted Internet, you might want to look into setting up a VPN, either through the university, or through a private provider. Please get in touch as soon as possible if you run into any connectivity issues.
Now that we have that, let’s get started! First, let’s set up a couple of accounts:
- Create an account on our Zulip chatroom. Please use your real/preferred name as your username and display name, so that I can identify you.
- Complete your profile on Zulip. Please add a picture of yourself, and fill out all the profile fields.
- Introduce yourself to everyone in the chatroom.
- Sign up for a user account on hypothes.is, our annotation platform. Please use the same username you used for Zulip.
- Download and install Anaconda, a Python distribution, which contains a lot of useful data science packages.
You will likely need some extra help at some point, either for the literary aspect of the course, or the technological aspect. Don’t worry. That’s totally normal.
If you want some extra help, or want to read a little more about some of the things we’re doing, there are plenty of resources out there. If you want a second opinion about a question, or have questions that we can’t answer in the chatroom, a good website for getting help with programming is StackOverflow. Also, the Internet is full of Python learning resources. One of my favorites is CodeCademy, which has a game-like interactive interface, badges, and more. There’s also the fantastic interactive textbook How to Think Like a Computer Scientist, which is the textbook for Computing in Context, the introduction to Python at Columbia’s Computer Science department.
Resources related to text analysis include, but are by no means limited to:
- The NLTK Book
- My introduction to text analysis tutorial
- My advanced text analysis tutorial with SpaCy
A colleague and I have also put together a few guides for beginning programming:
If you’re feeling like you need some help catching up with literary-critical terminology, or traditions of scholarship, here is a list of useful reference volumes, some of which are available online:
- The Broadview and Norton Critical Editions listed below.
- Abrams, A Glossary of Literary Terms
- A Companion to the Victorian Novel
- The Cambridge Companion to Modernism
Coursework falls into three categories:
- Weekly Annotations and Discussions (40% of final grade)
- Thus: 20% annotations, 20% class discussions.
- Homework Assignments (30% of final grade)
- Final Project (30% of final grade)
And of course, there are three course readings: one novel and two short story collections. Reading these closely is crucial: this will allow you to contextualize your quantitative analyses, and will prepare you for the close reading tasks of the final paper.
All readings are provided in digital form on the course website. They are one novel and several short stories:
- Wilkie Collins, The Moonstone
- Katherine Mansfield, The Garden Party and Other Stories
- James Joyce, Dubliners
If you prefer to read on paper, or to supplement your reading with background information and critical articles, I highly recommend the Broadview and Norton Critical Editions. They are full of interesting essays and explanatory notes.
- Wilkie Collins, The Moonstone, Broadview Edition
- Katherine Mansfield, The Garden Party and Other Stories, in Katherine Mansfield’s Selected Stories, Norton Critical Edition
- James Joyce, Dubliners, Norton Critical Edition
For each reading assignment, please write 3-4 annotations to our editions of the text, using hypothes.is. Links are provided below. You’ll have to sign up for a hypothes.is account first. As above, please use your real name as your username, so I know who you are. You may write about anything you want, but it will help your final project to think about ways in which computational analysis might help you to better understand what you observe in the text. Good annotations are:
- Concise (think: a long tweet)
- Well-written (although not too formal)
- Observant (rather than evaluative)
You may respond to another student’s annotation for one or two of your annotations, if you want. Just make your responses equally as thoughtful.
Four short homework assignments, of 3-15 questions each, will be assigned, and are due the following week, before our discussion starts. Jupyter notebook templates for each will be provided. Since we’ll review the homework answers at the beginning of each week, late work cannot be accepted. Please submit homework assignments on CourseWorks. If you’re auditing the course, or not yet in the course roster, just email me your homework notebook.
Feel free to consult with others, on Zulip, for hints or directions for homework problems. Just don’t share any answers, and make sure that your work is ultimately your own.
The final project should be a literary argument, presented in the form of a short academic paper, created from the application of one or more of the text analysis techniques we have learned toward the analysis of a text or corpus of your choosing. Should you choose to work with a text or corpus other than the ones we’ve discussed in class, please clear it with me beforehand. Your paper should be a single Jupyter notebook, including prose in Markdown, code in Python, in-text citations, and a bibliography. A template will be provided. The length, not including the code, should be about 2,000 to 3,000 words (I provide a script you can use to count your words). You’re allowed a maximum of three figures, so produce plots selectively.
During the final week of class, we’ll have final project presentations. Your paper isn’t required to be complete by then, but you’ll be expected to speak about your project for 4 minutes. Consider it a conference presentation.
Final papers will be evaluated according to the:
- Quality of the literary critical argument presented
- Quality of the close readings of the text or corpus
- Quality of the Python text analysis
- Literary interpretation of the results
- Integration of the computational analysis with the literary argument
As with homework, please submit these on CourseWorks, or email them to me if you don’t have access to CourseWorks. You may optionally submit your final project to the course git repository, making it public, for a 5% bonus.
For a more thorough set of recommendations and instructions for the final project, see the final-project-instructions.md file in the course repository.
Note: this schedule is subject to some change, so please check the course website for the most up-to-date version.
Week 1, 2020-09-09: Course Intro.
- Lecture 1: Introduction.
- Lecture 2: Getting Started
- Read the syllabus in full.
- Complete the steps in the section “Getting Started,” above.
Week 2, 2020-09-16: Python Basics
- Lecture 3: String Methods and For Loops
- Lecture 4: If, Lists, Dictionaries
- Reading: The Moonstone, Prologue and First Period, Through Chapter XI
Week 3, 2020-09-23: Text Analysis Basics
- Lecture 5: Working with Files
- Lecture 6: Introducing the NLTK
- Reading: First Period, Complete
- Homework 1 assigned.
Week 4, 2020-09-30: Working with Words
- Lecture 7: Stems, Lemmas, Functions
- Reading: Second Period, First, and Second Narratives
- Homework 1 due
Week 5, 2020-10-07: Word Frequencies
- Lecture 8: Types, Tokens, Counting Words
- Lecture 9: Pandas for Word Frequency Analysis. Distinctive words.
- Text: Second Period, Third Narrative
Week 6 2020-10-14: Lexical Techniques
- Lecture 10: Narrative Time Analysis and N-Grams
- Lecture 11: WordNet and WordNet-Based Analysis
- Reading: The Moonstone, Complete
- Homework 2 Assigned
Week 7 2020-10-21: Syntactic Analysis
- Lecture 12: POS cont’d. Corpora.
- Reading: “The Garden Party”, “The Daughters of the Late Colonel”
- Homework 2 due
Week 8 2020-10-28: Corpus Linguistics
- Reading: “The Young Girl”
- Reading: “Marriage à la Mode”
- Lecture 13: Corpora continued. Scikit-learn.
- Lecture 14: Stylometry, Corpus-DB
- Homework 3 assigned.
Week 9 2020-11-04: Corpora Cont’d
Week 10: 2020-11-11: Probabilistic Approaches
- Reading: “The Sisters,” “An Encounter”
- Lecture 17: SpaCy and Named Entity Recognition
- Lecture 18: Sentiment Analysis and Macro-Etymological Analysis
- Homework 4 assigned.
Week 11 2020-11-18: Syntactic Analysis II
- Reading: “Araby”, “Eveline”
- Lecture 19: Sentence Structure Analysis Using SpaCy
- Homework 4 Due
Week 12 2020-12-02: Graph (Network) Analysis
Week 13 4.1 2020-12-09: Final Project Presentations, and Wrap-Up
- Final project presentations due. See final-project-instructions.md
Finals Week: 12-16
- Final projects due, on CourseWorks. See final-project-instructions.md