managing data Archives | Nick Blackbourn

When you start a new writing project you should spend some time thinking about how to organize the data you will find, the notes you will take, and the drafts you will create. You’ll thank yourself later. I did.

Before I dived into my history PhD I considered how I would set up a system to store and retrieve the information I would gather over my multi-year project. This post is about how I use Devonthink to organize my archival data, books, journal articles, and notes for my history PhD dissertation.

What’s Needed

Storage – A way to store thousands of files of varying sizes and formats.
Retrieval – A filing system that makes sense. Documents need to be easy to find, ideally in a number of different ways.
Tagging – A way to tag files and folders with different labels so they can be displayed in different contexts.
Back-ups – Data must be safe, fool-proof to back up, and easy to restore if anything goes wrong.

The Options

These are the solutions that I seriously considered:

Local file structure

No cost – no fancy software solution required
Foolproof – nothing to break or go wrong
No features – no learning curve

Evernote

Ubiquitous – apps everywhere
Cloud-based – data stored in a server farm
Search functions – powerful discovery solutions

Devonthink

Standalone – doesn’t require internet connection
Visual – uses intuitive folder structure
Intelligent – pdf text recognition, search, and ‘relevancy’ sort features

As the title of the post suggests, I went with Devonthink. I liked how the database could sit on top of my local files, which means I’m always in control of the original files. Perhaps irrationally, I wanted the files and software to be on my computer, and not in the cloud on Evernote’s servers.

Over time I’ve learned to love Devonthink’s interface: I like seeing my file structure tree, it somehow helps me remember where things are. Alongside Devonthink’s sorting features, I’m very happy with these features, and I’m confident that I went with the right software for my research/writing style.

Devonthink: A Quick Introduction

What’s Devonthink actually for? It’s software that allows you to create your own database containing all the digital material you might need to work on a large writing project (in my case a history PhD). It can hold the many file types associated with a large project, text files, pictures, PDFs, spreadsheets, etc.

There are different packages, but I really think the Office Pro version – the bells-and-whistles version – is worth the price tag ($150, education discount available).

Devonthink isn’t just a nice way to store your files, it’s also designed to help you make sense of it all in a number of different ways. With all your files and notes in the same place, you can focus on organizing your thoughts and generating writing ideas.

Here’s the accurate software description from the Devonthink website:

Collect, store, work: Your Mac paperless office

DEVONthink saves all your documents, keeps them organized, and recalls them whenever you need them.

Now there’s no need to store Office files, PDFs, bookmarks or other information in separate apps.

How to set it up?

It’s worth spending some time getting to know Devonthink. Most of the important features are intuitive, but if you are comfortable with how the software works you can organize and sift through your data in some powerful ways.

I used a digital camera to take pictures of all the material I needed in the archives. I’ve written a post on this here.

These photos (thousands!) are saved in a file structure that carefully replicates the archive box and folder system to avoid any citation confusion in years to come. I imported these files to Devonthink and converted them into a searchable PDF format (which took quite some time).

I’m lucky that most of the archival material for my project is type-written on white paper, which means the Optical Character Recognition (OCR) of my archive pictures is quite accurate, but certainly isn’t perfect. I’d say it’s roughly 65% correct for my data. That’s good enough to be a useful tool alongside the tagging and grouping I’ve done, but not accurate enough to rely on to draw reliable conclusions from.

There’s no way that I consider my project as a ‘big data’ study: I’ve read every document in my library. The OCR capability is a fantastic convenience, but not a research methodology.

Importing and Organizing Your Data:

I recreate the archive’s filing system in my Devonthink database.

Importing Archival data:

When importing, I chose not rename any of my photo files. They are numbered sequentially and I reset the counter on my camera for each archive I visited. I decided to maintain these file names to avoid any confusion. It’s also turned out to be useful having a very short citation code (i.e. [9999]) to use in notes and early drafting.

These pictures are backed up in multiple places – they are my data-babies, there’s no project without them!

Importing Secondary data:

I created another Devonthink library for my PDF books and articles. These PDF files are linked to the corresponding bibliographic entry in Zotero. I use the free software Skim to read and annotate these files, and also store a .txt version of my highlights and notes separately.

(You can read more about my workflow here.)

I digitized many of my PhD books via the 1DollarScan service. These PDF scans are text searchable and stored in my database.

Relevant web pages, via the Devonthink Chrome tool, are imported into my database, so that all relevant information I found is tagged, indexed, and searchable in my database.

Devonthink for Note taking:

Another folder exists for my notes, which I make an effort to type up if handwritten. I generally use TextEdit for quick-and-easy notetaking.

Devonthink gives an incentive to write up thought pieces, because, as the Magic Hat image below shows, the software can then refer you to material you already have in your database that might help develop your thoughts.

What a great incentive to keep research notes: your database will automatically suggest further reading to you!

Back it up!

It’s worth mentioning to always, always, always be comfortable that your database is backed up properly. I have nightmares that my laptop gets snapped in half, so I make sure that if it actually happened no information would be lost.

As well as backing up to an external hard drive, I’ve set my Devonthink database to save in my Dropbox file and be safely stored in the cloud. I also use Backblaze to triple back things up.

Using Your Devonthink Database:

After this setup, almost everything related to your writing project is held inside Devonthink.

This gives an amazing sense of control over the research you’ve collected for your project. You can search across your database for something you’ve read but ‘lost,’ have your digitally recreated archives ready to interrogate, write research notes, and find related material instantly.

As your project develops, any extra information or ideas or articles can be collected straight in Devonthink via its ‘global inbox’ interface. It’s then easy to tag this new material and organize it into your database as you choose.

Needless to say, I’ve found Devonthink an indispensable tool for organizing my research. I use it every day.

Why is Devonthink useful?

The software lets you make your own connections between files with tagging and grouping functionality. I tag individual files with key information relevant to the file. Devonthink can then display all files with the same tag, which means working on a particular theme becomes straightforward.

For example, I might tag a file by date, the author, and the topic. Later, when I’m writing about a certain year, I can pull-up all the documents relating to that year. I then might write about that author and can quickly switch tag and see every document she wrote or featured in.

Devonthink can show you related files based on word frequency.

With one of my favorite features, Devonthink can automate this process of making connections between all of your documents by comparing the text within them.

Basing a ‘score’ on similar words and their frequency, Devonthink suggests other relevant files in your database that you might have forgotten or not immediately considered relevant. I’ve made some important connections in my own dissertation by using this tool.

Links and Further Info:

The following posts are where I first found out about Devonthink and started to think about how it might work for me. Everyone should create their database in the way that works best for their writing/research style, so be sure to look at Rachael Loew’s and Chad Black’s posts for more ideas.

In addition, I’ve found this ebook to be extremely detailed and useful: Take Control of Getting Started with DEVONthink 2

Enjoy playing with your data!

Tag Archive for: managing data

Using Devonthink to Organize a Writing Project