search

 Cross-training: Learning probabilistic relations between taxonomies

0 comments

file time: 2008-02-16

filetype:ppt

Click Here To Download...

>  

Cross-training: Learning 
probabilistic relations 
between taxonomies 

Sunita Sarawagi 
Soumen Chakrabarti 
Shantanu Godbole

IIT Bombay

 

Chakrabarti  

KDD2003  

2  

Document classification 

Set of labels A Bookmark folders, Yahoo topics Training documents, each with one A label Supervised approach Use training docs to induce classifier Invoke classifier on each unlabeled document in isolation Semi-supervised approach Unlabeled documents available during training Nigam et al. show how to exploit them collectively  

Chakrabarti  

KDD2003  

3  

Cross-training from another taxonomy 

Another set of labels B, partly related but not identical to A A=Dmoz topics, B=Yahoo topics A=Personal bookmark topics, B=Yahoo topics Training docs come in two flavors now Fully labeled with A and B labels (rare) Half-labeled with either an A or a B label

Can B make classification for A more accurate (and vice versa)?

Inductive transfer, multi-task learning  

DA 

DB

 

Chakrabarti  

KDD2003  

4  

Motivation 

Symmetric taxonomy mapping Ecommerce catalogs: A=distributor, B=retailer Web directories: A = Dmoz, B = Yahoo Incomplete taxonomies, small training sets Bookmark taxonomy vs. Yahoo Cartesian label spaces  

UK 

USA 

00/font> 

Regional 

Top 

Sports 

Baseball 

Cricket 

Region00/font> 

00/font>Topic 

Label-pair- 
conditioned 
term distribution

 

Chakrabarti  

KDD2003  

5  

Obvious approach: Labels as features 

A-label known, estimate B-label Suppose we have A+B labeled training set Discrete valued 00abel column0000/font>  
 
 
Multinomial na茂ve Bayes too biased, cannot balance heterogeneous features Do not have fully-labeled data Must guess 00/font> (use soft scores instead of 0/1)  

Term feature values 

00/font> 

00/i> 

Augmented feature vector 

Target label

 

Chakrabarti  

KDD2003  

6  

SVM-CT: Cross-trained SVM 

S(A,0) 

Train 

DA00/font>DB 

Docs having only A-labels 

One-vs-rest SVM 
ensemble for A: 
returns |A| scores 
for each test doc 
(signed distance 
from separator) 

DB00/font>DA 

Docs having only B-labels 

Test 

Test 
output 

00/font> t 00/font> 

00/font>|A|00/font> 

00/i> 

Label 

Text features 

S(B,1) 

Train 

One-vs-rest SVM 
ensemble for B 
(target label set) 

Test case with 
A-label known 
(coded using a 
vector of +1  
and 00) 

Term features 

00/font>1,00/font>,00/font>1,+1,00/font>1,00/font> 

S(A,1) 
S(B,2) 
S(A,2) 
00/font>

 

Chakrabarti  

KDD2003  

7  

SVM-CT anecdotes 

Discriminant reveals relations between A and B One-to-one, many-to-one, related, antagonistic However, accuracy gains are meager  

Positive 

Negative

 

Chakrabarti  

KDD2003  

8  

EM1D: Info from unlabeled docs 

Use training docs to induce initial classifier for taxonomy B, say Repeat until classifier satisfactory Estimate Pr(00/font>|d) for unlabeled doc d, 000B Reweigh d by factor Pr(00/font>|d) and add to training set for label 00/font> Retrain classifier

EM1D: Expectation maximization with one label set B (Nigam et al.)

Ignores labels from another taxonomy A  

Chakrabarti  

KDD2003  

9  

Stratified EM1D 

Target labels = B B-labeled docs are labeled training instances Consider A-labeled docs labeled 00/font> These are unlabeled for taxonomy B Run EM1D for each row 00/font> Test instance has 00known Invoke semi-supervised model for row 00to classify EM2D minus 2D model interaction  

00/font>A topics 

B-topics00/font> 

Docs in DA00i>DB  
labeled 00/font> 

00/font> 

DB00i>DA: docs 
with B-labels 

Docs in DA00i>DB  
labeled 00/font>00/font>

 

Chakrabarti  

KDD2003  

10  

EM2D: Cartesian product EM 

Initialize with fully labeled  
docs which go to a  
specific (0000 cell Smear training doc  
across label row or column Uniform smear could be bad Use a na茂ve Bayes classifier to seed Parameters extended from EM1D 00sub>00/sub>,00/sub> prior probability for label pair (0000 00sub>00/sub>,00/sub>,t multinomial term probability for (0000  

Labels in A00/font> 

Labels in B00/font> 

A-labeled doc 

B-labeled doc

 

Chakrabarti  

KDD2003  

11  

EM2D updates 

E-step for an A-labeled document  
 
M-step  

Updated 
class-pair 
priors 

Updated 
class-pair- 
conditioned 
term stats

 

Chakrabarti  

KDD2003  

12  

Applying EM2D to a test doc 

Mapping a B-labeled test doc d to an A label (e-commerce catalogs) Given 00/font>, find argmax00/sub> Pr(0000d) Classifying a document d with no labels to an A label Aggregation For each 00 compute 00sub>00/sub> Pr(0000d), pick best 00/font> Guessing (EM2D-G) Guess the best 00/font>* using a B-classifier Find argmax00/sub> Pr(0000|d) EM pitfalls: damping factor, early stopping  

Chakrabarti  

KDD2003  

13  

Experiments 

Selected 5 Dmoz and Yahoo subtree pairs Compare EM2D against Na茂ve Bayes, best #features and smoothing EM1D: ignore labels from other taxonomy, consider as unlabeled docs Stratified EM1D Mapping test doc with A-label to B-label or vice versa Classifying zero-labeled test doc Accuracy = fraction with correct labels  

Chakrabarti  

KDD2003  

14  

Accuracy benefits in mapping 

EM1D and NB are close, because training set sizes for each taxonomy are not too small EM2D > Stratified EM1D > NB 2d transfer of model info seems important  

Improvement 
over NB:

30% best, 
10% average

 

Chakrabarti  

KDD2003  

15  

Asymmetric setting 

Few (only 300) bookmarked URLs 
(taxonomy B, target) Many Yahoo URLs, larger number of classes (taxonomy A) Need to control damping factor (= importance of labeled :: unlabeled) to tackle population skew  

Chakrabarti  

KDD2003  

16  

Zero-labeled test documents 

EM1D improves accuracy only for 12 train docs EM2D with guessing improves beyond EM1D In fact, better than aggregating scores to 1d Choice of unlabeled:labeled damping ratio L may be important to get benefits  

Chakrabarti  

KDD2003  

17  

Robustness to initialization 

Seeding choices: hard (best class), NB scores, uniform Smear a fraction uniformly, rest by NB scores EM2D is robust to wide range of smear fractions Fully uniform smearing can fail (local optima)  

Uniform 
smear 

Na茂ve 
Bayes 
smear

 

Chakrabarti  

KDD2003  

18  

Related work 

Multi-task learning, 00ife-long learning00 inductive transfer (Thrun, Caruana) Find earlier learning tasks similar to current Reuse models, features, parameters Co-training (Blum, Mitchell) Two learners over a single label set Partitioned feature set Catalog mapping (Agrawal, Srikant) Two-label docs to estimate priors Raise prior to exponent, tune by validation EM2D: generative model, slightly better accuracy  

Chakrabarti  

KDD2003  

19  

Summary and future work 

Two algorithms for cross-training EM-based semi-supervised algorithm EM2D SVM-based algorithm SVM-CT Benefits Improved accuracy Interpretable mappings between label sets General issue: how best to deal with a large number of heterogeneous attributes? Future work Brittle na茂ve Bayes scores in EM2D Small relative gains in SVM-CT: 
better kernels?  feature selection?  

Given the extra label info, can we improve upon transductive algorithms which use only a pool of unlabeled documents?

 

Because SVM has its own bias.  (Narrate self-testing experiment?)

 

Turns out that this does not do as well as our final choice00/font>

 

Priors pi reveal correlation or otherwise between a pair of labels.

 

Discuss figs 10, 14, 12

 

What is L?

NB in only one L

   download Cross-training: Learning probabilistic relations between taxonomies

Responses to Cross-training: Learning probabilistic relations between taxonomies

It's no comment...

 

Your Name:
Your Email:
Your Talk: