I am often asked to review papers in which the authors have deployed a mobile phone app that collects data about the app's users. In some cases, these apps are overtly used for data collection and the users of the app are told how this data will be collected and used. But I have read a number of papers in which data collection has been embedded into apps that have some other purpose -- such as games or photo sharing. The goal, of course, is to get a lot of people to install the app, which is great for getting lots of "real world" data for a research paper. In some cases, I have downloaded the app in question and installed it, only to discover that the app never informs the user that it is collecting sensitive data in the background.
The problem is, such practices are unethical (and possibly illegal) according to federal requirements for protecting the privacy for human subjects in a research study. Even if there is some fine print in the app the use of data for a research study, it's not clear to me that in all cases the researchers have actually gone through the federally-mandated Institutional Review Board approval process to collect this data.
Unfortunately, not many computer scientists seem to be familiar with the IRB approval requirement for studies involving human subjects. Our field is pretty lax about this, but I think it's time we started taking human subjects approval more seriously.
It is now dead simple to develop mobile apps that collect all kinds of data about their users. On the Android platform, an app can collect data such as the device's GPS location; which other apps are running and how much network traffic they use; what type of wireless network the device is using; the device manufacturer, model, and OS version; which cellular carrier the device uses; the device's battery level; and the current cell tower ID. Similar provisions exist on iOS and other mobile operating systems. With rooted devices, it's possible to collect even more information, such as a complete network packet trace and complete information on which websites and apps have been used.
Put together, this data can yield a rich picture of the usage patterns, mobility, and network performance experienced by a mobile user. It is very tempting for researchers to exploit this capability, and it's easy to get thousands of people to install your app by releasing it on Google Play or the Apple App Store. However, I have very little confidence that most researchers are adhering to legal and ethical guidelines for collecting such data -- I bet the typical scenario is that the data ends up being logged to an unsecured computer under some grad student's desk.
So, what is an IRB? In the US and many other countries, any institution that receives federal funding must ensure that research studies involving human subjects protect the rights and privacy of the participants in such studies. This is accomplished through Institutional Review Board review which much occur prior to the study taking place. The purpose of the IRB is to ensure that the study meets certain guidelines for protecting the privacy of the study participants. The Stanford IRB Website has some good background about the purpose of IRB approval and what the process is like. The principles underpinning IRB review were set forth in the Declaration of Helsinki, which has been the basis for many countries' laws regarding protection of human subjects.
Failing to get IRB approval for a research study is serious business. In the medical and social science communities, failing to get IRB approval is tantamount to faking data or plagiarism. The Retraction Watch blog has a long list of cases in which published articles have been retracted due to lack of IRB approval. In those fields, this kind of forced retraction can destroy an academic's career.
Documenting IRB approval and informed consent for study participants is becoming standard practice in the medical and social science communities. For example, the submission guidelines to the Annals of Internal Medicine require an explicit statement from authors regarding IRB approval:
"The authors must confirm review of the study by the appropriate institutional review board or affirm that the protocol is consistent with the principles of the Declaration of Helsinki (see World Medical Association). If the authors did not obtain institutional review board approval before the start of the study, they should so state and explain the circumstances. If the study was exempt from review, the authors must state that such exemption complied with the policy of their local institutional review board. They should affirm that study participants gave their informed consent or state than an institutional review board approved conduct of the research without explicit consent from the participants. If patients are identifiable from illustrations, photographs, pedigrees, case reports, or other study data, the authors must submit the release form for each such individual (or copies of the figures with the appropriate release statement) giving permission for publication with the manuscript. Consult the Research section of the American College of Physicians Ethics Manual for further information."
But yet, in computer science, we tend not to take this process very seriously. I suspect most computer scientists have never heard of, or dealt with, their institution's IRB. I was surprised to see that CHI, the top conference in the area of human-computer interaction (in which user studies are commonplace), says nothing in its call for papers about requiring IRB approval disclosure for human subjects studies -- perhaps the practice of obtaining IRB approval is already widespread in that community, though I doubt it.
Why do I think we should require authors to document IRB approval? For two reasons. First, to raise awareness of this issue and ensure that authors are aware of their obligations before they submit a paper to such venues. Second, to prevent paper reviewers from having to make a judgment call when a paper is unclear on whether and how a study protects its participants. The whole point of an IRB is to front-load the approval process before the research study even begins, well before a paper gets submitted. The nature of a research project may well change depending on the IRB's requirements for protecting user privacy.
To give an example of how this can be done properly, colleagues of mine at University of Michigan and University of Washington are developing a mobile app for collecting network performance data, called MobiPerf. The PIs have IRB approval for this study and the app clearly informs the users that the data will be collected for a research study when the app first starts; clicking "No thanks" immediately exits the app. Furthermore, there is a fairly detailed privacy statement and EULA on the app's website, explaining exactly what data is collected. It's true that going through these steps required more effort on the part of the researchers, but it's not just a good idea -- it's the law.
This is my personal blog. The views expressed here are mine alone and not those of my employer.