Comments on Volatile and Decentralized: The CS Grad Student Lab Manual

I agree on the idea that the computer sciences has...

2010-02-09T16:41:46.177-08:00

I agree on the idea that the computer sciences has more experimental cycles, say repeated experments..hence a manual is perhaps not advisable to some extent.

i would strongly support a course on rigorous expe...

2010-01-14T21:33:48.851-08:00

i would strongly support a course on rigorous experimentation, analysis and validation techniques for every systems ph.d. student. there are papers from paxson, floyd, and willinger which advocate rigorous analysis - something that every systems/networking student should read. unfortunately there are many many papers out there (even in tier-1 conferences) which have sloppy and/or buggy analysis. to make things worse:

1. it is not uncommon to see people refusing to share their analysis code or simply ignoring requests for code (leave alone making it public by themselves).

2. it is not uncommon to see papers with propreitary datasets which are massive (ex. from ISPs and datacenters) in papers from tier-1 conferences. obviously, these results cannot be validated, and are taken on face-value (they may get in on the basis of the data, since a reviewer may have no clue of accuracy the results).

3. it is not uncommon to see a paper which does not provide enough details to reproduce its results; this is not uncommon in tier-1 conferences either. problem (1) above further complicates things with such work.

the pressure on a graduate student (to be able to get a job) and tenure track faculty (to get tenure) to publish in tier-1 venues ends up churning out last minute analysis, and thus many of these problems. i'm not claiming to have a solution, and i don't think there is one. in non-CS fields (esp. medicine, where accuracy of results really matters) this may work out better, given that they do not have "yearly deadlines" and can submit to their journals when they are confident that they have finished.

Another starting poing might be How to do Research...

2010-01-14T19:18:33.695-08:00

Another starting poing might be How to do Research At the MIT AI Lab. It is a bit more general than what you propose but similar in spirit.

As a start, read Vern Paxson's IMC paper from ...

2010-01-14T18:24:16.366-08:00

As a start, read Vern Paxson's IMC paper from 2004: Strategies for sound internet measurement.

I also really like the idea of providing grad stud...

2010-01-13T01:02:57.759-08:00

I also really like the idea of providing grad students with thorough guide lines on how to run experiments!

What bothers me a lot is that little authors make the data their papers are based on available. Of course, most will provide you with the data if requested, but I think it should be common procedure that if you publish some findings that you derive from experimentally gathered data, you should also publish that data (e.g. by providing a link to a tar ball).

Next, I think any experiments whose results are published need to be reproducable by other researchers. That is, I would like to see every paper give a reference to some tar ball including all the scripts for experiment control and data analysis. This doesn't have to be polished, but should give an external researcher who was not involved in the experiments a good idea of how experiments were carried out in detail.

The practice of publishing your methods and tools also facilitates starting up similar research.

As somebody new to the field, I am wondering why this is not common practice. Are people ashamed of dirty hacks? Afraid of flaws being exposed? Or others gaining advantage of the time invested into setting up experiments?

I really like the "lab manual" idea. Bes...

2010-01-12T20:49:46.171-08:00

I really like the "lab manual" idea. Best practices would seem to be a lot easier for the larger community to get behind than any one set of procedures.

It seems like it would be especially valuable to emphasize to beginning grad students that, whether for the resubmission of a rejected paper or the camera-ready for an accepted one, you're probably going to have to run at least one of your experiments again, so you might as well make your experiments repeatable and document how you did them. I learned that particular lesson the hard way and I wish someone would have drilled that into my head earlier.

I suspect that more thorough and rigorous experime...

2010-01-12T19:41:53.934-08:00

I suspect that more thorough and rigorous experimental methodology would arise naturally if computer science had a strong tradition of repeating experiments. But, for a variety of reasons (proprietary code and/or data, complexity, rapid change, etc.), we don't.

I often wonder what (if any) impact a lack of controlled, repeatable experimentation has had on progress in the field. It's tough to say quantitatively, but qualitatively it has resulted in many derisive comments from my friends in the natural sciences 8).