Process

In this section, we'll walk through the process of putting together a project with humanities data.

Picking a topic#

First things first, what do you even want to do? Big question, right? Don't worry, it's normal to feel lost at the start. You're probably at the beginning of your academic career and haven't gone through the years of training like most scholars have. You might not feel committed to one discipline or subject area yet. If that's the case, this is your chance to experiment! You can try on a topic like you try on a pair of shoes, spending a little time in them to see if they're still comfortable after walking a mile to the grocery store. For most of you, this is one assignment for one class, not something that has to follow you the rest of your life, unless you want it to. Embrace the ability to travel in your own direction.

Some of you may be "idea people" and feel like you're brimming with potential topics already. If so, that's great! There will be a section about "scope" for you later on. If you need some help generating ideas, it can be helpful to consider these categories:

  • Objects - What thing do you want to learn more about? Is there a book, a moment in history, or an archival collection that has caught your interest? Are you interested in fanfiction about anime, 19th century detective novels, the 1918 pandemic, or perhaps some beautiful illustrated maps you saw in a museum exhibit? Some people are drawn to objects and collections, and work from there. There are lots of ways to source or generate data about objects. You might want to spend your time crafting a beautiful, if small, data set about a set of objects.

  • Questions - What questions keep you up at night? What do you ponder while you run or while you're driving? What have you wondered about while reading for your philosophy class? You might be more of an abstract thinker. How can you approach these questions in a data-driven way? You might be more interested in big approaches to data - what can learn from hundreds or thousands of texts? What patterns can you find through time and space? You should consider working with existing data sets and playing with the kinds of questions and answers you can find.

  • Skills - What do you want to get better at? Some folks are more motivated by improving their own abilities, rather than doing a deep dive into a topic. That's okay too. What skills do you want to work on? How might they transfer to another project or method? Do you want to get really good at data visualization, or maybe play with a new programming language? Or perhaps build confidence with web scraping? Go for it! You can let the software drive your decision making. We need the builders, just as much as we need the analyzers.

Research!#

Once you have an inkling of what your topic might be, it's time to get researching. You want to get a sense of what is out there already about this topic. You want to find resources that might help with the context, but you also want to see what kind of work has already been done. Save yourself the frustration of getting halfway through a data collection project before you realize that someone has already done this work before. Getting a sense of existing approaches might also help steer you toward your approach. What do similar projects lack? How can you fill in the gaps?

Some of you may already be flying through the internet, opening up tab after tab, making notes. But you might also find this part of the process to be paralyzing. It's a wide open world out there. Here are some tips to help you structure this important piece:

  • Structure your research process into phases. Gathering, analyzing, refining, etc. Give yourself time to explore, without deciding whether or not an article, a tutorial, a website is useful or not. You're just trying familiarize yourself with what is out there. You can use a document or something like ZOtero to help.

  • Once you've done your gathering, then spend some time actually reading and considering what you've found. It is useful? Does it help you think about your topic in a new way? Did the title seem great but the content was lacking? Does an article give you a new path to explore? Try to sit with what you've found and think about it before moving on.

  • Once you have a solid sense of what is out there about your topic, it's time to refine and narrow your scope. This stage has its own challenges. It can be hard to make a decision at this stage. You feel pressure to make the right one. Don't worry, you can still pivot if you need to, but remember that there are benefits to making decisions and moving ahead now. You're probably on a deadline and need to get moving onto the next phase of the project. Consider creating a mental or physical "parking lot" for ideas or resources that you are intrigued by, but don't have time for right now.

  • You may notice that these tips are conflating the idea of your topic and the research about your topic. That's because at this stage, things are still rather fluid. You may discover that the object, question, or skill that was going to be your topic is just not tenable at this time. Or you may be researching the historical background for one thing and come across an amazing data set for another thing that changes your direction.

Enough pep talk! Give me some real tips:

  • Start at the library. Whether it's your library's website or the physical location, browse the resources that your institution has provided for you. Talk to a librarian you know, they love to help.

  • Start local. Every town has its stories. Archives, museums, historical societies, and special collections can hold original or unique materials about the people and places nearby. In this coursebook, we've worked with data about the inhabitants of the local cemetery and the school newspaper. These local stories may not have been analyzed or shared before, but they still have relevance to larger issues. Not all small institutions have the resources to make their holdings available online, so you may have to talk to someone or get creative.

  • Consider the discipline. Don't forget that scholarly work segments itself by discipline, so you may have to navigate those boundaries. Is your topic fundamentally about literature? About history? About philosophy? What other disciplines does it touch? Check out the scholarly organization for that discipline, its major journals, or research centers.

What's my project?#

Hopefully, if you've read other sections of this coursebook, you have an idea of what humanities data could look like. But what will your data look like? How do you take a topic idea and turn that into data?

  1. During the research process, identify and inventory any existing data sets related to your topic. What's there? What's missing? How might you fill in a gap or take your research in a new direction?

  2. Start asking questions. What do you want to know? What can you learn from this object/corpus? You don't have to have a firm research question yet, but you should be wondering things.

  3. What kind of approach do you want to take? How might that dictate the form of your data? Do you need a list of people and their attributes for network analysis? Do you want to count things and visualize patterns? Do you need place names and coordinates?

  4. Consider your limitations. What is your timeframe? What is your skill level? What data is available already and what must you create yourself? Are you working alone or in a team? Do you have access to everything you need? Are you prepared to do a lot of data entry, or do you know that tedious work is not your thing?

  5. Set your scope. Then make your scope a little smaller. Trust me. You can always add more later, but when you're new to this kind of work, it's best to start small. If you want to map something, do you need to map the whole country? WOuld it be better to do a region, a state, or even a county? If you're interested in text mining 19th century literature, perhaps a single genre, author, or timespan could serve as a boundary to that work?

At this point, you should be at a place where you can write a proposal for your project. You know the types of things you want to do, the questions you want to explore, even if you don't have the answers yet.

If you're still stuck, try this flow chart from The Pudding on Writing a Data-Driven Story

Research Questions and Data Modeling#

In the data section, we covered the mechanics of data modeling. Here, let's cover it as part of the research and project process. Chances are, data modeling will go hand in hand with determining your specific research questions. You will need to iterate over this process a few times before you've arrived at something feasible.

  1. What is one potential research question that you can think of? Pay attention to how you begin your question. Are you asking how many? Who? What?
  2. How would you answer that question with some kind of data? It can be an unproven, hypothetical answer, but you should think through what kind of answers you're looking for.
  3. Now swim around in your answer a little bit. What is your answer made of? Do you need to count something to get to that answer? What do you need to count? Are you tracking something over time, and therefore need time-based data? Are you looking at relationships, so you need information about people?
  4. Make a list of all the pieces of information you might need to answer your question. This could be the beginning of your data model.
  5. Go through these steps again with a different question and potential answer. How do your data needs compare? Can you start to see a spreadsheet or a corpus forming? Do you need to do more research?

How did that go? Hopefully by working through those questions, you should have an idea of where you're headed. Again, it's totally normal to adjust or go through this cycle again. You may even get to the point of visualization and realize you need to regroup. For now, let's assume you're ready to start putting together your data. You have some options.

Creating data#

You may determine that you need to create your data set from scratch. What does "from scratch" mean? Are eggs involved? Chances are you are not going to be pulling data magically from your brain. It's more likely that you will have a source or multiple sources that need to be transformed in a structured data set that can be analyzed by a piece of software. The draft data model you created will guide you in this process.

Many humanities data projects begin with the kind of old books that seem to be forgotten about on library shelves. Scholars of past centuries did a lot of data-driven work, they just put all their data into a print book, and now that information needs to be transformed. For example, The Ancient Graffiti Project relies on a 19th century collection of inscriptions gathered into set of large, heavy volumes known as Corpus Inscriptionum Latinarum. Mapping the Scottish Reformation is a prosopography project that gathers data about Scottish ministers from a text known as the Fasti. While the author of the Fasti, Hew Scott, did tremendous work in compiling this text, he was loose with the facts and didn't cite his sources. In the end, both of these projects will provide a database and visualizations for users to ask their own research questions. But the process of getting there involves careful data entry and cross-referencing. The project team is not necessarily transcribing texts from beginning to end, they are extracting each data point in a way that fits their data model. They might not extract every piece of information, instead, they're letting their project goals guide them. Rest assured, both of these projects added to and refined their data model over time.

If you have generated some research questions and drafted a data model, go ahead and open up a new spreadsheet. Label some fields, start filling in data, then step back and review your work. Does your source include information you forgot to put in your data model? Do these fields make sense to another person? Do you need another spreadsheet? On the flip side, are you trying to gather too much data? Do you need every single detail?

If you're not sure how to answer these questions, why not skip ahead to visualization or analysis? Test out the data you've gathered so far, even if it's not complete, just to see how it performs. Again, this iterative approach is key to creating a workable project.

Finding data#

You may determine that your project or your constraints require that you find existing data sources. That's great! You're reusing existing information in new ways! But it can be a struggle to find the exact data set you're looking for. There is not one single catalog of data that you can use to find what you're looking for. You may have to combine pieces of multiple data sets. Alternatively, you may not have access or be legally allowed to use the data you're seeking. It's another reason to identify other scholars working on similar topic, you may be able to ask advice or find guidance.

Places to look:

Analyzing your data#

Okay, let's say you have put together a respectable data set that you think addresses your goals. What next? Throughout this coursebook, we have explored a range of analysis methods. Your research questions should help you determine the method or methods that make sense. But once you start using a method or visualizing your data, how do you know when you're getting answers? Here are some things to keep in mind:

  • Know your material. It's up to you and your understanding of your subject to determine what matters. If you've generated a network graph, but you don't know much about the major players, you're going to struggle. Let your visualizations inspire you to dig back into research topics.

  • Set benchmarks. Running a process once or creating one visualization isn't enough. Perform your analysis methods over and over so you have a sense of what is normal vs. extreme. You might also find errors in your data this way. If you're interested in how often the word "love" is used over time, you might want to try synonyms of that word. Do all words behave the same way? What are the trends over time? Is that actually a unique line on your graph?

  • Remember your discipline. Positioning your work within a discipline is not an arbitrary decision. It gives you standards and priorities for your work. A literary scholar is going to approach a text differently than a historian. They're going to care about different questions. If you're a historian, you might be intrigued by the use of literary symbolism in a text, but know that you have to focus on the historical elements, not the style. Read the literature of the discipline and review other projects to get a sense of how their analysis works.

To pull this all together, let's walk through a case study of a project from a former student, Alice. Alice is an art history major with some experience in digital methods from other classes and her work on a DH project. In one of her classes, she learns about the Spanish Gallery. The Spanish Gallery was assembled at the Louvre by King Louis Phillipe in 1838 to showcase Spanish art during a time of French occupation of Spain. After his death, the gallery was sent to auction in London and dispersed into private hands. Alice wonders what happened to these paintings after the auction, as well as their origins in Spain. She imagines what the gallery might have looked like all assembled, similar to a project she put together on Boydell's Shakespeare Gallery. In reading the Wikipedia article, she notices a link to the digital copy of the gallery catalog from 1838. She recognizes these catalog as a potential source of data, especially when combined with her knowledge of searching auction records. The catalog is arranged in alphabetical order by artist, with a brief bio, then a list of the paintings and their dimensions. Alice sees that the digital book is available to download as plain text, which will make it easier to work with. She also notices that the text isn't quite perfect, so she will need to budget time for correction. Not to mention it's all in French! Finally, there appear to be annotations in pencil on the pages, so that may be something to look into.

Alice opens Google Sheets and gets to work on this spreadsheet. The catalog has assigned each painting a number, so she uses that as an ID field. The thing that is being described in each row is the painting, with the artist, bio, size, location modifying that painting. After starting that work, she realizes that it's repetitive to include the art information multiple times, so she creates a second sheet that is just a list of the artists and their information. She cleverly uses a built-in translation feature in Google sheets to translate the French to English for the whole column, rather than doing it one by one: =GOOGLETRANSLATE(B2, "fr", "en").

Alice can't help but start researching the modern locations of these pieces of art. She finds that this work is going to be more challenging than she anticipated, so she creates another sheet to separate this work into a second phase of the project. There just aren't enough accessible records for each piece of work. Phase 1 will have to be about creating data on the gallery itself, Phase 2 will concern the auction and dispersal of the art. Similarly, finding the original locations of the art may be challenging too, since the catalog only lists where the artist was from. She knows another student who has used Google Street View and tourist websites to confirm the location of shrines in Spain. She supposes that's Phase 3!

Alice's project is still in progress, but hopefully this example gave you an idea of how a project like this can change as it progresses.

Resources#

results matching ""

    No results matching ""