Simon Benjamin Orion Parent

sboparen@gmail.com

How Programmers Comment When They Think Nobody's Watching

(PDF) Master's Thesis

Abstract

Documentation is essential to software development. Experienced programmers know this well from having worked with poorly documented code. They wish to improve their documentation techniques and habits, but there is little consensus for them to follow. Somehow, the many different standards must be compared objectively. This desire motivates my work, which aims to better understand existing documentation practices.

This work focuses exclusively on comments within the program code. Programming is a complex human activity, despite a widespread misconception among programmers that writing code is a mechanical process. This is especially true of comments, where programmers express themselves freely. My work fills a gap in research on software documentation by systematically investigating the comments in a unique database of code written by programmers under natural conditions.

The true variety of programming behaviour is surprising. But this variety does not mean that the output of programmers is completely arbitrary; there are patterns in this data, which my research aims to understand.

This work makes three contributions:

Open Data

The aggregate data studied in Chapters 3 and 4 is available for download with all textual information removed.

(gzipped tar file) Aggregate Data

The full database of code studied in this thesis is available for legitimate research purposes. The data is not freely available for download, because it is impossible to ensure that all personal identifiers have been removed. Therefore, in order to protect the anonymity of the programmers, the data is available only upon request. To request it, please contact the current custodian of the data, whose email address is given below.

wmcowan@cgl.uwaterloo.ca