Documenting your PhD — Keeping Track of Meetings, Experiments and Decisions • David Stutz

A PhD can be a difficult endeavour. While becoming an expert in tackling a specific problems, it is easy to lose track of things: Have I read this paper before? What was the paper saying? Why did we decide to change course? Why am I running these experiments? Who did I talk to at the last conference? Throughout my PhD, I learned that documenting these things made me significantly more effective with minimal overhead. In this article, I want to share some learnings around what and how to keep track of PhD work.

Introduction

In the beginning of my PhD, I had a series of conversations with PhD students and academics about how to keep track of everything during a PhD. Some of these discussions were motivated by the 2018 Workshop of being a Good Citizen of CVPR. This inspired me to start thinking properly about how to organize and eventually document my PhD work. In retrospective, I think this was one of my better decisions. Only recently, at the Heidelberg Laureate Forum 2023, I found that many successful academics also document, for example, who they talk to during conferences to stay on top of things. In this article, I want to share my approach to keeping track and organizing my PhD work.

Originally, I literally documented nearly everything in a single document; I called it the master document. It was one (very very) long LaTeX document checked into a Git repository. Nowadays, at Google DeepMind, it feels more like a log; it is still one long document with tons of links to Drive folders, docs or slides. I am sure there are better tools out there. However, it does not matter whether it is a single document, whether it is Google Docs or LaTeX. The how is less important. My key insight is to start documenting the essentials of a PhD in the first place. I am convinced that this can be achieved with minimal overhead while reaping many benefits.

In the following, I want to go through the biggest parts of my PhD — namely reading, meetings, experiments and networking — and outline what I found important to note down and how I feel it helped me be successful.

Documenting References and Readings

Every research project usually starts by reading papers and developing ideas. At this phase, I found it incredibly useful to (a) keep track of what papers I read — even if I just skimmed them — and (b) keep short summaries of key papers relevant to my planned or ongoing research.

I still keep track of all papers that I look at in a central, well-organized BibTeX file. This can be done using very limited overhead as it only involves copying a BibTex entry from Google Scholar, Semantic Scholar or DBLP and giving it a unique name. In addition, I used to put papers into rough categories. The categorization is usually based on a very quick skim of the abstract. Occasionally, I shared parts of these categorizations here on my blog, for example, see 240+ Papers on Adversarial Examples. For papers that I ended up reading more thoroughly because they are directly related to my research, I also used to add a short summary. Usually this involves only a few sentences of what the key idea of the paper is. Often I also included questions I had while reading — either when I didn’t understand something or when I had follow-up ideas. Many of these are actually on ShortScience or in my READING NOTES . I tried different tools to keep track of summarizes, like individual documents, OneNote, or a tree of markdown files, but eventually just added them to my main LaTeX document.

Keeping track of papers in this way has several immediate benefits. For example, it simplifies paper writing immensely: I was able to write related work sections super quickly and did not need to spend much time interrupting my writing to look up and add BibTeX entries. My advisors and many reviewers repeatedely highlighted how thorough and detailed my related work sections were even though they weren’t a lot of effort. The paper summaries were particularly useful to differentiate my work from similar papers or respond to reviews. Beyond paper writing, having a categorized list of papers helps preparing lectures and can be useful to students working on their bachelor/master thesis. I also found it useful to know which papers I already read. If somebody shared a paper with me and it was in my list, I knew that I either have a summary or the paper was not really relevant to my work. This is also useful to prioritize which talks or posters to attend at conferences as I usually read many pre-prints before they got published. Lastly, having a good overview of papers in my field also helped to speed up any reviewing duties.

Meetings and Decisions

Reading papers and developing research ideas is usually a collaborative endeavour. In my case, I had regular meetings with my advisors and collaborators where we discussed papers, ideas or experiments. While keeping meeting minutes is rather normal in industry, I found that meetings are handled more informally in academia. Often, there were no clear expectations in terms of who and how meetings are recorded. However, I learned that keeping track of meeting contents and decisions can be extremely useful in research. There are three main reasons for this:

First, meetings usually result in some sort of decisions. For example, what research question to focus on, what experiments to prioritize, who to ask for help or collaborate with, where to submit, how to address reviews, and so on. Unfortunately, it is easy to forget about some decisions or to forget to communicate them to collaborators that were not present. More importantly, it is easy to forget about the reasons for which specific decisions were made. For me, this meant that I often forgot why we decided to do specific experiments and I needed to check back with my advisors or revisit parts of the discussion. As soon as I started keeping track of meeting discussion and decisions, it became much easier to organize my daily work and communicate results and decisions to my collaborators.

Second, keeping track of discussions and who was involved helps to avoid conflicts or repeated arguments. Personally, I rarely had any conflicts or problems with my collaborators. However, I experienced many PhD students having arguments about authorship or contributions. These conflicts grew over time because parties have different recollections about meetings or decisions. I noticed that many students were successful in avoiding such conflicts by recording meetings and sharing the notes with their collaborators afterwards. This exposes misunderstandings very early where they can be resolved much more easily.

Finally, it improves visibility. For example, whenever some collaborators where not present during some meetings, I used to send around the notes so everyone is on the same page. It also made it easier to articulate what I am working on during group meetings or standups. This led to the research group being familiar with what I was working on and which problems I was facing.

Experiments

Once a research idea is selected as promising, the actual research work starts. Theoretical work is documented in terms of working notes or hand-written scribbles that are easy to digitize and keep track of. For coding, version control systems such as Git are widely adopted — although best practices are somtimes missing in academia (frequent commits, descriptive commit messages, utilizing branching, etc.). What remains, and is often harder to properly keep track of, are experiments. I believe that this is actually a result of machine learning being a rather young empirical science. In fact, lab notebooks are extremely common in many other disciplines. Often, there are courses on how to maintain lab notebooks properly such as this one from Columbia University. When I met PhD students from the Helmholtz Institute for Pharmaceutical Research, for example, they told me that keeping a lab notebook is prerequisite of actually entering the lab spaces to perform experiments.

Personally, I never maintained a lab notebook in the above sense as I only learned about it towards the end of my PhD. However, in between my first two papers, I realized that keeping track of experiment hypotheses, configurations and results is crucial for publishing at the top-tier venues. This is particularly important as machine learning experiments are getting more involved from the engineering perspective. Industrial research labs such as Google DeepMind actually have ready-to-use infrastructure to keep track of experiments and make them reproducible. As a PhD student, however, I had to build this infrastructure myself and I plan to write a separate blog article about the technical infrastructure.

Here, I want to focus on the non-technical aspect of documenting experiments: the experiment hypothesis and the conclusions. This also ties into the above section because experiments are often motivated by discussions and decisions made in meetings with collaborators. Initially, I started writing down only a few sentences describing why I want to run an experiment. This could be as simple as “does data augmentation X improve adversarial robustness?”. Then, after the experiment, I collected the key results in the form of a plot or a table and some observations. These are then discussed and might lead to additional experiments. The key thing for me was to be consistent. This became important when writing the paper or giving presentations. With this log of experiments it is very easy to go back and check why specific hyper-parameters were chosen, how alternative methods or baselines performed, etc.

Events, Applications and Career Decisions

Convincing PhD students of documenting meetings or experiments is comparably easy. The benefits of doing so are usually noticable with the next paper submission. However, keeping track of interactions at conferences or other events, fellowship or internship applications and higher-level career decisions has less immediate benefits. Often, connections that I made at conferences or talks I gave at other research labs only became relevant or interesting years later. When I couldn’t remember other researchers that I spoke to at previous conferences, I started noting down who I talked to and finding them on social media to stay in touch. I also kept better track of workshops and talks I attended or took photos of posters. All of this can be done rather easily. But it needs to be done consistently, for example, sitting down in the evening after each conference day to note down topics and names and consistently asking people for their company cards or taking photos of their badges. By now, many conferences and events print QR codes on badges that make it much easier.

Conclusion

Overall, throughout my PhD, I learned that keeping a written log of things is incredibly useful. Surprisingly, I found that only few PhD students actually do it systematically — especially in machine learning. I am also not sure if I have figured it out perfectly. Even if not, I noticed that it was incredibly useful throughout my PhD and beyond. In this article, I outlined what I tend to keep track of and how it benefitted my work and ultimately my research output:

I kept a central list and BibTeX file of all papers I read or cited;

I documented the discussions and decisions in most of the meetings I attended or organized;

I logged experiment hypotheses and outcomes;

I noted down connections I made at conferences as well as talks and events I attended;

Source link
lol