Welcome to my Lab Notebook - Reloaded

Welcome to my lab notebook, version 3.0. My original open lab notebooks began on the wiki platform OpenWetWare, moved to a personally hosted Wordpress platform, and now run on a Jekyll-powered platform (site-config), but the basic idea remains the same. For completeness, earlier entries from both platforms have been migrated here. Quoting from my original introduction to the Wordpress notebook:

Disclaimer: Not a Blog

floatright Welcome to my open lab notebook. This is the active, permanent record of my scientific research, standing in place of the traditional paper bound lab notebook. The notebook is primarily a tool for me to do science, not communicate it. I write my entries with the hope that they are intelligible to my future self; and maybe my collaborators and experts in my field. Only the occasional entry will be written for a more general audience. […] In these pages you will find not only thoughts and ideas, but references to the literature I read, the codes or manuscripts I write, derivations I scribble and graphs I create and mistakes I make. 

Why an open notebook? Is it working?

My original introduction to the notebook from November 2010 dodged this question by suggesting the exercise was merely an experiment to see if any of the purported benefits or supposed risks were well-founded. Nearly three years in, can I draw any conclusions from this open notebook experiment?

In that time, the notebook has seen six projects go from conception to publication, and a seventh founder on a null result (see #tribolium). Several more projects continue to unfold. I have often worked on several projects simultaneously, and some projects branch off while others merge, making it difficult to capture all the posts associated with a single paper into a single tag or category. Of course not all ideas make it into the paper, but they remain captured in the notebook. I often return to my earlier posts for my own reference, and frequently pass links to particular entries to collaborators or other colleagues. On occasion I have pointed reviewers of my papers to certain entries discussing why we did y instead of x, and so forth. Both close colleagues and researchers I’ve never met have emailed me to follow up on something they had read in my notebook. This evidence suggests that the practice of open notebook science can faciliate both the performance and dissemination of research while remaining compatible and even synergistic with academic publishing.

I am both proud and nervous to know of a half dozen other researchers who have credited me for inspiring them to adopt open or partially open lab notebooks online. I am particularly grateful for the examples, interactions, and ideas from established practitioners of open notebook science in other fields. My collaborators have been largely been somewhere between favorable and agnostic towards the idea, with the occasional request for delayed or off-line notes. More often gaps arise from my own lapses in writing (or at least being intelligible), though the automated records from Github in particular, as well as Flickr (image log), Mendeley (reading log), and Twitter and the like help make up for some of the gaps.

The Integrated Notebook becomes the Knitted Notebook

In creating my wordpress lab notebook, I put forward the idea of an “Integrated Lab Notebook”, a somewhat convoluted scheme in which I would describe my ideas and analyses in Wordpress posts, embed figures from Flickr, and link them to code on Github. Knitr simplified all that. I can now write code, analysis, figures, equations, citations, etc, into a single Rmarkdown format and track it’s evolution through git version control. The knitr markdown format goes smoothly on Github, the lab notebook, and even into generating pdf or word documents for publication, never seperating the code from the results. For details, see “writing reproducibly in the open with knitr.”

You can page through the notebook chronologically just like any paper notebook using the “Next” and “Previous” buttons on the sidebar. The notebook also leverages all of the standard features of a blog:

  • the ability to search,
  • browse the archives by date,
  • browse by tag or
  • browse by category.
  • follow the RSS feed
  • add and share comments in Disqus

I use categories as the electronic equivalent of separate paper notebooks, dividing out my ecological research projects, evolutionary research topics, my teaching notebook, and a few others. As such, each entry is (usually) made into exactly one category. I use tags for more flexible topics, usually refecting particular projects or methods, and entries can have zero or multiple tags.

It can be difficult to get the big picture of a project by merely flipping through entries. The chronological flow of a notebook is a poor fit to the very nonlinear nature of research. Reproducing particular results frequently requires additional information (also data and software) that are not part of the daily entries. Github repositories have been the perfect answer to these challenges.

(The real notebook is Github)

My Github repositories offer a kind of inverted version of the lab notebook, grouped by project (tag) rather than chronology. Each of my research projects is now is given it’s own public Github repository. I work primarily in R because it is widely used by ecologists and statisicians, and has a strong emphasis on reproducible research. The “R package” structure turns out to be brilliantly designed for research projects, which specifies particular files for essential metadata (title, description, authors, software dependencies, etc), data, documentation, and source code (see my workflow for details). Rather than have each analysis described in full in my notebook, they live as seperate knitr markdown files in the inst/examples directory of the R package, where their history can be browsed on Github, complete with their commit logs. Long or frequently used blocks of code are written into functions with proper documentation in the package source-code directory /R, keeping the analysis files cleaner and consistent.

The issues tracker connected to each Github repository provides a rich TO DO list for the project. Progress on any issue often takes the form of subsequent commits of a particular analysis file, and that commit log can automatically be appended to the issue.

The social lab notebook

When scripting analyses or writing papers, pretty much everything can be captured on Github. I have recently added a short script to Jekyll which will pull the relevant commit logs into that day’s post automatically. Other activities fit less neatly into this mold (reading, math, notes from seminars and conferences), so these things get traditional notebook entries. I’m exploring automated integration for other activities, such as pulling my current reading from Mendeley or my recent discussions from Twitter into the notebook as well. For now, feed for each of these appear at the top of my notebook homepage, with links to the associated sites.