/ #politics #white supremacy 

Followup: OkCupid and Open Science

A recovered post on the response to the OkCupid data breach in 2016

The storm has died down a bit regarding the breach of almost 70,000 OkCupid users’ highly personal data. The response was rapid and critical; my response focused on the lack of informed consent practice, along with some other issues, while a fantastic take by Oz Keyes explored how the researchers buried academic dishonesty behind a virtue of openness.

It is this latter point that perhaps deserves some more introspection. I am a strong proponent of open source, open science, and open data, and this is highlighted in my conference talks along with my research approach. Open science and open data is a worthy virtue. But it is not the only virtue.

Like many arguments over free speech, it is easy to lose the forest for the trees. While the freedom of speech is one of the most fundamental human rights we have, it is not the only such right. And like data, particularly government-funded data, should be open, openness does not absolve the analyst of the sins of its use.

In my prior post, I called for the Center for Open Science to remove the archive. I now refute that stance. The Center for Open Science negotiated with Mr. Kierkegaard to, for now, password protect the sensitive user information. Although this has flaws, it is in my opinion the best near term solution.

There is a salient truth in this situation: open science worked. The authors had a fervent predilection for open data; fittingly, it was this that bit them in the ass. Open science is about democritizing (or at least de-meritocracizing) peer review. In this case, the authors put their research up for review on an open platform. Users with various levels of expertise, in queer issues, research ethics, statistical methodology, and so forth, including expertise that far outstrips my own, were able to freely analyze and criticize the research. The fundamental working premise of open science was operating in full gear.

I no longer think that COS should remove the data or the report. The user information is, for now, protected. COS should ensure that it remains so. But the report and the compiled results should stay. The failures of methodology and scientific method should remain on full display. These researchers should not so easily escape the ramifications of their shortcomings. I am happy to read a retraction. But science is about transparency, and removing this would, rightly, be both a strike against transparency and a strike against the peer review that discredited the research.

Importantly, the repository should stand as a strong example of the working potential of open science.

Author

EG

Emily is a data scientist and activist. The opinions shared herein are her own.