A report from one of the Cqrrelations working groups:
From the start we were interested in how a Gold Standard is established, a paradoxical situation where human input is both considered a source of truth, and made invisible. Annotation here means the manual work of ‘scoring’ large amounts of data that can than be used for ‘training’ algorithms. This scored data becomes a the reference against which the algorithm is trained and tested. The Annotator is typically a student or Mechanical Turk worker, or sometimes the work has been already done for another reason, such as in the case of the sentiment analysis algorithm, where the Gold Standard for deciding between positive or negative language patterns is based on a large corpus of movie reviews along with explicit rating of the described movie.
In-between the solution-oriented and mystifying descriptions of several algorithms for text-mining that we looked at, the actual conditions, context and work of annotation felt surprisingly undervalued and under-documented. Only in a few cases, and often hidden far away in software sources, we found descriptions of the method of annotation.
It seems that annotation always implies a contextualperspective. Scoring sources is also time consuming and boring; itcan only speed up when the annotator does not doubt her opinions. Through the development of pattern.en.paternalism we wanted to both experience and challenge this practice. Our decision to work with a contested ‘polarity’ such as paternalism, was of course deliberate.
We wanted to:
Understand the work of annotation
Expose the place/role of the annotator
See what place dissent could have in text-mining