Annotation
By Daisy Abbott, University of Glasgow
1. Introduction
Annotation is the process of
adding notes on or to something. It plays a particularly important role in
scholarly work as explanations, descriptions, and interpretations can inform
any future research on the annotated data. Annotations can be applied to all
kinds of digital data and the process can be undertaken for a variety of
reasons including description, correction, classification, reviewing,
interpretation, and augmentation of the data.
Examples Of Annotation In
Practice
Different communities of practice
use annotation for different purposes and in different ways. The following list
of examples is not exhaustive:
- Annotated bibliographies provide descriptions about how each
source is useful to an author in constructing a paper or argument.
- Linguistic annotation can refer to any descriptive or
analytic data applied to raw language data. Annotations are often textual
(e.g. transcripts) but can also include any other type of data (e.g.
syntactic analysis). A collection of texts with linguistic annotations is
known as a corpus.
- Programmers often add annotations (comments)
to source code explicating in a more easily human-readable fashion the
function of the code and perhaps also ideas for development.
- DNA annotation has been ongoing since the 1980s in
the field of molecular biology, usually in predefined fields in sequence
databases.
- Automatic image annotation is the process by which a computer
system assigns metadata in the form of keywords to a digital image. Images
can also be annotated with technical information which speeds up
content-based retrieval, for example histograms of colour content and
segmentation information.
- Performers of all kinds often make
annotations to physical objects (e.g. scores and scripts) during the
rehearsal process. Digital records can also be annotated to increase
understanding, for example, video annotation can be in
the form of time stamps, overlaid text or images, and commentaries.
- Annotations can be associated with a
web resource and can allow users to personalise a web page (editing,
adding, or removing information) without modifying the underlying
resource. Due to their connectivity, web annotations are
particularly collaborative in nature. Many Web 2.0 applications now make
extensive and imaginative use of annotations to review, rate, improve,
adapt, discuss, qualify, or critique content provided by both website
owners and other users.
- Collaborative annotation using free keyword tags to
categorise content can create a user-generated taxonomy, known as a folksonomy.
This type of social bookmarking is popular due to
participation being fast and intuitive and can result in new ways of
visualising links between information such as tag clouds.
2. Short-term
Benefits And Long-term Value
- Descriptive annotations, like
metadata, are crucial for information retrieval. For example, simple
keyword annotations can summarise a long text, or describe non-textual
media making it possible to use text searching tools to discover images,
audio, or video. Many content-based retrieval systems also use non-textual
annotations of one kind or another which distill various aspects of files
into data which can be processed more quickly, for example music can be
matched by comparing previously-extracted numerical data on volume levels
and images can be rated for similarity based on histograms of colour data.1
- Annotations also augment core data by
providing layers of explanation and interpretation. In this way, each
contribution provides future users with data enriched by multiple opinions
and several areas of expertise. Annotations can even identify errors in
the core data.
- In the sciences, research is becoming
increasingly reliant on the reuse of existing datasets. As part of an
ongoing knowledge base, annotation is a crucial learning and
interpretation tool. In some fields such as molecular biology, the major
value lies in the annotations rather than the core data and this
information represents a huge investment of skill and effort.2
- Annotations can assist in integrating
results obtained by different research groups by providing aggregation
information for multiple databases, and identifying the provenance of each
part of the data.
- One of the primary purposes of
annotation is to disseminate information (for example in textual criticism
or cartography). However, annotations also provide supplementary
information about the way in which as a resource is used, the people using
it, and the context of their contribution. In this way annotations support
new research with a wider focus than that of the core data itself, such as
research into social collaborative behaviours through the process of
annotation.3
- Annotation supports collaboration,
whether amongst a small team with similar expertise, or open to
contributions from anyone with a connection to the Web. It can also
promote increased engagement with the subject matter.
- Automatic annotation tools are a
research challenge in their own right and can save a vast amount of time
and effort from manual addition of useful supplementary information. This
research area is important to both commercial and research institutions.4
3. He/fe
Perspective
"Active reading is the
combination of reading with critical thinking and learning, and is a
fundamental part of education and knowledge work. Active reading involves not
just reading, but also underlining, highlighting and scribbling comments,
either on the text itself or in a separate notebook. Active watching is a
similar concept applicable to the way we consume the material we watch. As text
annotation promotes active reading, so will video annotation promote active
watching."
— Correia, N.
& Chambel, T. 'Active video watching using annotation' in Proceedings
of the seventh ACM international conference on Multimedia (1999)
4. e-Science
Perspective
"An essential
requirement [within e-Research Environments] is the need to authenticate the
source of the annotation and to restrict access to a particular group of
trusted colleagues, for reasons of privacy, confidentiality or intellectual
property. This is particularly important within e-Science, where the annotation
or interpretation of the raw document or data, is often more valuable than the
target of the annotation."
— Schroeter, R.
et al. (2006). "A Synchronous Multimedia Annotation System for Secure
Collaboratories" in Proceedings of the Second IEEE International
Conference on e-Science and Grid Computing
5. Issues To Be
Considered
- Although annotations sit alongside
the original data, they are used to explain or interpret the data and
therefore it is important to consider both the reliability of information
and opinion in annotations and their authenticity and provenance. It is
also important to demonstrate ways in which the annotations can be
separated from the core data in order to prevent misinterpretation of the
resource.
- Annotations from different sources
can have different intrinsic 'value', for example, comments by a
well-respected expert will be of more use to future users than a request
for clarification. Should this hierarchy of value be represented and if
so, how?
- Annotations are not a static
resource; the body of annotations (and potentially also their relative
value) changes over time. This creates issues of tracking when information
is added but also provides opportunities for temporal analysis of the
growth of the supplementary information.
- Different communities use the term
'annotation' differently and have widely varying associations. This lack
of standardisation risks misunderstandings between collaborators. It is
particularly important to define and use ontologies in some collaborative
annotation activities to ensure consistency in data query and analysis.5
- Annotations themselves can be
annotated, leading to issues of how to structure the relationships between
data. Annotations can be logically grouped as branching threads or
hierarchies which requires structural information. This leads to questions
about how annotations and their authors can be tracked, and how the
connections can be visualised by subsequent users.
- As they provide layers of additional
information, information retrieval can be applied to the content of
annotations as well as the underlying data.
- It is an ongoing challenge for
annotation tools to handle the above issues whilst remaining
user-friendly. The visualisation of annotations can be extremely important
in terms of disseminating without overwhelming, demonstrating
functionality, separating core data from annotations or distinguishing
between 'high value' and 'low value' annotations, and allowing users to
customise what information is displayed dependent on role or interest.
6. Additional
Resources
- Buneman P. et al. (2005). "Annotation in Scientific
Data: a Scoping Report"
- Bird, S. & Liberman, M. (2001).
"A formal framework for linguistic annotation" in Speech
Communication, no. 33
- Dill, S. et al. (2003). "SemTag
and seeker: bootstrapping the semantic web via automated semantic
annotation" in Proceedings of the 12th international conference
on the World Wide Web, Budapest
- Marshall, C. "Annotation: from
paper books to the digital library"
- Ruvane, M. (2005). "Annotation
as Process: A Vital Information Seeking Activity in Historical Geographic
Research" in Proceedings of the 68th Annual Meeting of the
American Society for Information Science and Technology (ASIST) no.
42
- Schreiber, A. et al. (2001).
"Ontology-based photo annotation" in Intelligent Systems (IEEE),
vol. 16, no. 3
- STARS
(Semantic Tools for Screen Arts Research)
- Web
Annotation Tools
- "Information Retrieval"
DigiCULT Technology Watch Report 3, December 2004
- Buneman P. et
al. "Annotation in
Scientific Data: a Scoping Report"
- Tags Network Narrative project
- For example:
Matellanes, A. et al. "Creating
an application for automatic annotation of images and video" ,
Automatic Annotation and Information Retrieval : New Perspectives (Special
Track at the 20th International Florida Artificial Intelligence Research
Society Conference 2007); Wyman, S. et al. (2004). "Automatic annotation of
organellar genomes with DOGMA"
- Cf. The National Center
for Biomedical Ontology