Fast, Cheap, and Out of Control: A Zero-Curation Model for Ontology Development
Benjamin Good
Wilkinson Laboratory
University of British Columbia
My Research Question
Can a mass-collaborative protocol produce useful bio-ontologies without centralized curatorial control ?
Mass collaboration
Creation of a knowledge resource entirely by the community that will ultimately use itSuccessful mass collaborations
Wikipedia Open Directories Project BioMOBY Open source software The World Wide WebRequirements for building an ontology
Identify people that have the knowledge Motivate them to share it Provide an interface that(a) allows them to share efficiently, and
(b) captures the knowledge in a well-ordered manner.
Expert identification
Scientific conferences very limited amount of time severe restrictions on interface design The Young Investigators Forum for Research in Circulatory and Respiratory Health - Winnipeg, Manitoba, CAN 2005 (where it was still snowing in May)Motivation
Competition
Altruism
Narcissism
Self-interest
iCAPTURer Protocol
Preprocessing Identify appropriate upper ontology Extract terminology from text sources Volunteer ontology engineering Filter and extend terminology Identify relationships between terms Evaluationhttp://bioinfo.icapture.ubc.ca:8090/icapturer/login.jsp
iCAPTURer2.0
Automatic term extraction - Text2Onto
immune response
Volunteers filter terms and extend terminology
Abstracts
Candidate terms
Validated terms
iCAPTURer - terminology builder
Taste bar
Cell foo
Cell biology queen
Glucose cell
apoptosis
immune response
smooth muscle cell
smooth muscle cell
iCAPTURer2.0
iCAPTURer - taxonomy builder
Volunteers assign parents
Validated terms
smooth muscle cell
T-cell activation
apoptosis
UMLS Semantic Network
Entity
Physical_Object
Event
Process
Generic Concept
Conceptual_Entity
Activity
iCAPTURer2.0
iCAPTURer - taxonomy builder
smooth muscle cell
T-cell activation
apoptosis
Entity
Physical_Object
Event
Process
Generic Concept
Conceptual_Entity
Activity
Terms now annotated into UMLS
iCAPTURer2.0
iCAPTURer - synonym collector
smooth muscle cell
Volunteers add synonyms
is the same as
?
smooth muscle cell
T-cell activation
apoptosis
Entity
Physical_Object
Event
Process
Generic Concept
Conceptual_Entity
Activity
Non striated muscle cell
Non-striated00/font>
a kind of
?
Is
T-cell activation
Immune response
Yes
Sometimes
No
Evaluation of volunteer assertions
I don00 know
Results:Volunteers
68/450 for the knowledge acquisition phase at the conferenceParticipant contributions
Collection
Volunteer
12
7
Percent of total knowledge added
Evaluation
Volunteer
Number votes
1000
65/250 for the evaluation phase from email recipientsComments from volunteers
00t has enzyme! Whooo I like it!00/font> "It's amusing me000 "Woo! Atherosclerosis is in there now!00/font> "Science isn't so anal you know...00/font> 00his a helluva lot more interesting than those talks were000Results: Knowledge
661
Terms
207
Hyponyms
340
Synonyms
1) Collection: 2 days , 65 participants
93% true > false
49% true > false
54% true > false
%00rue00votes
Term
hyponym
synonym
2) Evaluation: 3 days , 68 participants, 11,545 votes
Initial acquisition versus evaluation
Knowledge capture at YI forum
Evaluation conducted via email request
Number of assertions gathered
Forms Tree navigation Conference setting 2 days 65 people Multiple choice (voting) Home setting 3 days 68 people00 think that t cell activation is a kind of immune response00/font>
00 agree that t cell activation is a kind of immune response00/font>
11,000
1,000
Next Steps
The votes about the ontology represent the same knowledge that is needed to build the ontology Can we build an ontology using the interface implemented for ontology evaluation?Results: Summary
Produced an ontology describing aspects of circulatory and respiratory health in two days without a knowledge engineer with numerous but detectable flawsConclusions
Motivation is easy Interface design is hard A multiple-choice interface seems to produce the best results Mass collaboration shows promise in the domain of bio-ontology engineeringFuture Work
Reduce the human interface to EXCLUSIVELY multiple choice by: More extensive pre-processing More intelligent questioningAcknowledgements
Ivan Berkowitz, Bruce McManus (YI Forum) Mark Wilkinson Tim Chklovsky, Yolanda Gil All of the volunteers Rodney Brooks, author of 00ast, Cheap, and Out of Control: A Robot Invasion of the Solar System001989 Journal of the British Interplanetary Societyhttp://bioinfo.icapture.ubc.ca/bgood
Hiring!
1 or more Post doctoral fellows - in the domain of cardiology and/or ontology development 1 lead software developer for the BioMOBY project. http://biomoby.orghttp://bioinfo.icapture.ubc.ca
Time line
February, 2005
May, 2005
June, 2005
Forced to submit abstract
Knowledge capture at YI forum
Evaluation conducted via email request
time
time
Number of assertions gathered
Outline
Mass collaboration Experiment mass collaborative ontology development and evaluation Results ontology produced, cost and qualitydownload Fast, Cheap, and Out of Control: A Zero-Curation Model for Ontology ...
