Cross-training:
Learning
probabilistic relations
between taxonomies
Sunita Sarawagi
Soumen Chakrabarti
Shantanu Godbole
IIT Bombay
Chakrabarti
KDD2003
2
Document
classification
Chakrabarti
KDD2003
3
Cross-training
from another taxonomy
Can B make classification for A more accurate (and vice versa)?
Inductive transfer, multi-task learningDA
DB
Chakrabarti
KDD2003
4
Motivation
UK
USA
00/font>
Regional
Top
Sports
Baseball
Cricket
Region00/font>
00/font>Topic
Label-pair-
conditioned
term distribution
Chakrabarti
KDD2003
5
Obvious
approach: Labels as features
Multinomial na茂ve Bayes too biased, cannot balance heterogeneous features Do not have fully-labeled data Must guess 00/font> (use soft scores instead of 0/1)
Term feature values
00/font>
00/i>
Augmented feature vector
Target label
Chakrabarti
KDD2003
6
SVM-CT:
Cross-trained SVM
S(A,0)
Train
DA00/font>DB
Docs having only A-labels
One-vs-rest SVM
ensemble for A:
returns |A| scores
for each test doc
(signed distance
from separator)
DB00/font>DA
Docs having only B-labels
Test
Test
output
00/font>
t 00/font>
00/font>|A|00/font>
00/i>
Label
Text features
S(B,1)
Train
One-vs-rest SVM
ensemble for B
(target label set)
Test case with
A-label known
(coded using a
vector of +1
and 00)
Term features
00/font>1,00/font>,00/font>1,+1,00/font>1,00/font>
S(A,1)
S(B,2)
S(A,2)
00/font>
Chakrabarti
KDD2003
7
SVM-CT
anecdotes
Positive
Negative
Chakrabarti
KDD2003
8
EM1D:
Info from unlabeled docs
EM1D: Expectation maximization with one label set B (Nigam et al.)
Ignores labels from another taxonomy AChakrabarti
KDD2003
9
Stratified
EM1D
00/font>A
topics
B-topics00/font>
Docs in DA00i>DB
labeled 00/font>
00/font>
DB00i>DA:
docs
with B-labels
Docs in DA00i>DB
labeled 00/font>00/font>
Chakrabarti
KDD2003
10
EM2D:
Cartesian product EM
docs which go to a
specific (0000 cell Smear training doc
across label row or column Uniform smear could be bad Use a na茂ve Bayes classifier to seed Parameters extended from EM1D 00sub>00/sub>,00/sub> prior probability for label pair (0000 00sub>00/sub>,00/sub>,t multinomial term probability for (0000
Labels in A00/font>
Labels in B00/font>
A-labeled doc
B-labeled doc
Chakrabarti
KDD2003
11
EM2D updates
M-step
Updated
class-pair
priors
Updated
class-pair-
conditioned
term stats
Chakrabarti
KDD2003
12
Applying
EM2D to a test doc
Chakrabarti
KDD2003
13
Experiments
Chakrabarti
KDD2003
14
Accuracy
benefits in mapping
Improvement
over NB:
30% best,
10% average
Chakrabarti
KDD2003
15
Asymmetric
setting
(taxonomy B, target) Many Yahoo URLs, larger number of classes (taxonomy A) Need to control damping factor (= importance of labeled :: unlabeled) to tackle population skew
Chakrabarti
KDD2003
16
Zero-labeled
test documents
Chakrabarti
KDD2003
17
Robustness
to initialization
Uniform
smear
Na茂ve
Bayes
smear
Chakrabarti
KDD2003
18
Related
work
Chakrabarti
KDD2003
19
Summary
and future work
better kernels? feature selection?
Given the extra label info, can we improve upon transductive algorithms which use only a pool of unlabeled documents?
Because SVM has its own bias. (Narrate self-testing experiment?)
Turns out that this does not do as well as our final choice00/font>
Priors pi reveal correlation or otherwise between a pair of labels.
Discuss figs 10, 14, 12
What is L?
NB in only one L
download Cross-training: Learning probabilistic relations between taxonomies
