Edinburgh Research Archive

MilkMine: text-mining, milk proteins and hypothesis generation

dc.contributor.advisor
Sawyer, Lindsay
en
dc.contributor.advisor
Webber, Bonnie
en
dc.contributor.author
Edwards, Stephen
en
dc.contributor.sponsor
Biotechnology and Biological Sciences Research Council (BBSRC)
en
dc.date.accessioned
2010-10-06T10:26:42Z
dc.date.available
2010-10-06T10:26:42Z
dc.date.issued
2009
dc.description.abstract
The vast and increasing volume of biological data can make it a struggle for scientists to keep up-to-date with the latest research and as a consequence they may miss significant biological links, particularly those that extend outwith their own area of expertise. MilkMine is an attempt to provide a single informatics resource to help milk protein scientists mine this information mountain more effectively, by integrating standard experimental data types with data generated by emerging text-mining techniques. A method was initially developed to identify milk-related terminology from peer-reviewed biological literature and this was used to complement the Unified Medical Language System (UMLS), a large thesaurus of biological concepts, their variant names and their types. The resultant enriched ontology was then mapped to the free text of peer-reviewed biological literature using the MMTx program producing a database of semantically enriched sentences. A co-occurrence relation extraction algorithm was written to identify relationships between milk proteins and peptides, and other biological concepts, such as diseases or biological processes. Using these literature relation sets new hypotheses can be generated using the basic principle that if “A is linked to B”, and if “B is linked to C” then we can infer an association between A and C. Filtering and downstream processing of the many generated relationships promotes significant interactions. These literature relations and hypotheses are integrated with biological data into the MilkMine database. The MilkMine database is built upon on a generic data warehousing system, InterMine. This tool enabled the integration of traditional data types, such as protein sequence or structural data, from a variety of sources (e.g. UniProt). However, the standard InterMine model was also extended by the author to include other data sources (e.g. the Protein Data Bank) and to incorporate the output of the text-mining algorithm. This integration of otherwise disparate information allows more complex querying of the data, across many data types. For example, protein sequences are mapped to instances of the names, synonyms or symbols of the protein in text, therefore a raw fragment of amino acid sequence (e.g. a particular binding region) can be used to search the MilkMine database for literature information as well as the interactions and hypotheses of those proteins that contain the sequence. The MilkMine resource is accessible online (www.bioinformatics.ed.ac.uk/milkmine) through a professional level query interface offering many features such as an interactive query builder, standard ready-to-run queries, bulk downloads and the ability to store user preferences and query histories. Evaluation of MilkMine showed that the text-mining algorithm, as well as the data integration, could provide the user with interesting connections for further study.
en
dc.identifier.uri
http://hdl.handle.net/1842/3869
dc.language.iso
en
dc.publisher
The University of Edinburgh
en
dc.subject
text-mining
en
dc.subject
informatics
en
dc.subject
milk protein
en
dc.subject
milk peptide
en
dc.title
MilkMine: text-mining, milk proteins and hypothesis generation
en
dc.type
Thesis or Dissertation
en
dc.type.qualificationlevel
Masters
en
dc.type.qualificationname
MPhil Master of Philosophy
en

Files

Original bundle

Now showing 1 - 3 of 3
Name:
Edwards2009.pdf
Size:
4.2 MB
Format:
Adobe Portable Document Format
Description:
MPhil thesis
Name:
Edwards2009.doc
Size:
5.56 MB
Format:
Microsoft Word
Description:
File not available for download
Name:
MilkMine tutorial.doc
Size:
10.87 MB
Format:
Microsoft Word
Description:
additional tutorial

This item appears in the following Collection(s)