search

 ClearTK: A Framework for Statistical Biomedical Natural Language ...

0 comments

file time: 2008-02-16

filetype:pptx

Click Here To Download...

>  

ClearTK: A Framework for Statistical Biomedical Natural Language Processing 

Philip Ogren

Philipp Wetzler 
 
 

Department of Computer Science

University of Colorado at Boulder

 

Introduction 

ClearTK is a software package that: facilitates statistical biomedical natural language processing is written for UIMA  Java Provides extensible feature extraction library Interfaces with popular machine learning libraries Maximum Entropy (OpenNLP) Support Vector Machines (LIBSVM) Conditional Random Fields (Mallet) Misc. 00.g. Na茂ve Bayes (Weka) Available free for academic research

   (contact philip@ogren.info) 
 

 

UIMA 101 

text 

Common

Analysis

Structure (CAS) 

collection reader 

analysis engines 

consumers 

ClearTK provides a way to create analysis engines that use statistical models for classifying text. The structure of the CAS is defined by a type system determined by the development team.  

Statistical Biomedical Natural Language Processing 101 

Frame NLP task as classification task 00e.g. For named entity recognition classify tokens as one of 0000 0000 or 0000   
 
 
 
 
Training Manually annotate a bunch of data Extract features from text * Write out training data * Train a model Run time Extract features from unseen text * Classify features with trained model* Create annotations

* ClearTK facilitates these tasks 

The concentration of alpha 2-macroglobulin, alpha 1-antitrypsin, plasminogen, C3-complement, fibrinogen degradation products (FDP) and fibrinolytic activity... 
 















I

 

ClearTK Analysis Engine 

UIMA CAS

input annotations 

extract features 

feature

set 

classify 

UIMA CAS

output annotations 

find foci of analysis 

interpret result /

create annotations 

training data

 

ClearTK Summary 

Provides a framework that simplifies feature extraction and interfacing with a wide variety of machine learning libraries. Is not dependent on any specific type system Provides sophisticated feature extractors. Provides infrastructure supporting core library (i.e. collection readers, analysis engines, consumers, etc.) Well documented and unit tested.  

1

 

2

 

generated features are understood by all ClearTK classifiers 

some classifiers may be better at making use of some features than others (e.g. svm and numbers vs maxent) but all of them support the 4 basic feature types 

ClassifierAnnotator

 

decided not to integrate training to avoid complexity; is easily done outside of ClearTK, then packaged for use by ClearTK; this is more flexible 

classification: just need one model file (our format, jar), works the same for all classifier types 

sequential classifiers are a special case, require classification of a list of samples instead of just one 

don't implement any of our own classifiers; good libraries exist and are easy to interface with; we just define a standardized interface suitable for NLP purposes in UIMA context

   download ClearTK: A Framework for Statistical Biomedical Natural Language ...

Responses to ClearTK: A Framework for Statistical Biomedical Natural Language ...

It's no comment...

 

Your Name:
Your Email:
Your Talk: