search

 CoDNS: Improving DNS Performance and Reliability via Cooperative ...

0 comments

file time: 2008-03-05

file siez:748.5KB

filetype:ppt

Click Here To Download...

>  

2/4/08   

OSDI '04  

1  

CoDNS: Improving DNS Performance and Reliability   via Cooperative Lookups   

KyoungSoo Park, Vivek Pai, Larry Peterson,

Zhe Wang

Princeton University

 

2/4/08  

OSDI '04  

2  

Domain Name System(DNS) 

Human-friendly names 00/font> IP addresses Operational for over 20 years Essential part of the Web Two components Server-side: name owners Client-side: contacting name owners  

2/4/08  

OSDI '04  

3  

Two Kinds of DNS Problems 

Server-side problems [Danzig92], [Jung01] Nameserver bugs Misconfigurations Hardening/replacing server infrastructure Client-side problems Between local nameservers (LDNS) and clients Larger memories = higher LDNS hit rate LDNS cache hit rate : 80 ~ 90% Result: LDNS problems magnified  

2/4/08  

OSDI '04  

4  

Contributions 

Measure LDNS problems, causes Client-side DNS helper, CoDNS Communicates with other CoDNS peers Incrementally deployable Works with all DNS lookups (CDN, etc) Benefits Latency reduction: 27-82% Availability: generally adds extra 0000/font>  

2/4/08  

OSDI '04  

5  

Local DNS Lookup Problems 

Local DNS lookup failures 5+ seconds delay for cached records Frequent & widely-distributed* Unpredictable service Directly affects user-perceived latency Random delays in web access Kills HTTP proxies, web services, and busy mail servers  

2/4/08  

OSDI '04  

6  

Demonstrating Local Problems 

Local name lookup every 6 seconds  00yy.domain00on xxx.domain at 200 sites 00lanetlab-2.cs.princeton.edu00 for   planetlab-1.cs.princeton.edu Lookup should be handled locally LDNS is site-shared, NOT PlanetLab00 Failure criteria 5+ seconds of latency zero answer Rolling average of the past 100 queries  

2/4/08  

OSDI '04  

7  

Expected DNS Behavior 

University of Utah     Rice University  

2/4/08  

OSDI '04  

8  

DNS Failure on Various Nodes 

Cornell    Texas A&M    University of Oregon  

2/4/08  

OSDI '04  

9  

Possible Causes 

Packet loss LDNS overloading Cron jobs Maintenance problems  

2/4/08  

OSDI '04  

10  

Packet Loss  

UDP inherently unreliable Single loss triggers query retransmission Less than ~0.1% in LAN environment Heavily dependent on local traffic Losses last for ~ 1 min Cable modem/DSL may be worse Our sites have ~4 LAN hops, Cable ~8  

2/4/08  

OSDI '04  

11  

Nameserver Overloading 

University of Michigan    University of Torino, Italy    Technical University Berlin, Germany   

8 am 

6 pm 

8 am 

6 pm

 

2/4/08  

OSDI '04  

12  

Nameserver Overloading 

Many responses for 1 sec ~ 5 sec No timeout but simply late Pr (Overloading | DNS Failure) = 90% for some nodes Bursts cause socket buffer overflow Experiment in the paper  

2/4/08  

OSDI '04  

13  

Cron jobs/heavy processes 

University of Tennessee 1    University of Tennessee 2    Moscow State University   

Not a client problem!

 

2/4/08  

OSDI '04  

14  

Why Do We See This? 

Large memory 00/font> large cache Large cache 00/font> high hit rate High hit rate 00/font> CPU load drops Low CPU load 00/font> add more services More services 00/font> memory pressure Memory pressure 00/font> failures, delays  

2/4/08  

OSDI '04  

15  

Maintenance Problems 

/etc/resolv.conf Configured to dead nameservers Blocking services Outside the firewall Complete outage Berkeley Millennium nodes, 3/17/2004 Blackout / natural disaster Duke hit by hurricane Isabel, Fall/2003  

2/4/08  

OSDI '04  

16  

Wide Area Network(WAN) 

Solution:CoDNS 

CoDNS 

My LAN 

LAN 

LDNS 

CoDNS 

My Machine 

LDNS 

remote answer 

Client

Programs 

remote query

 

2/4/08  

OSDI '04  

17  

CoDNS : Cooperative DNS 

Cooperative name lookup scheme If local server OK, use local server When failing, ask peers to do lookup Insurance model Share risk, share benefits Aggregate name lookup service Aggregate cache effect Incrementally deployable, no server change  

2/4/08  

OSDI '04  

18  

Design Issues  

Proximity / liveness Select nearby peers Monitors nameserver00 health as well Request locality Pick same peer for same names Highest Random Weight (HRW) Remote request timeout Dynamically adjusted to local server00 health Exponentially backed off for each remote query  

2/4/08  

OSDI '04  

19  

How many peers needed? 

One extra peer halves

avg response time! 

Average Response Time

 

2/4/08  

OSDI '04  

20  

Effect of Timeout 

Average Number of Lookups 

200ms - slope changes

500ms - virtually flat

 

2/4/08  

OSDI '04  

21  

Deployment Status 

CoDNS deployed on all PlanetLab nodes Running 24/7 since August 2003 CoDeeN uses CoDNS as primary DNS After CoDeeN00 own DNS cache Remote query configuration One extra peer, 200ms starting timeout On total LDNS failure, send immediately Monitor 10 nodes as neighbors  

2/4/08  

OSDI '04  

22  

Evaluation 

Live traffic for one week for CoDeeN (20k - 30k)  

2/4/08  

OSDI '04  

23  

Finer-grained View 

Live traffic for one day Effectively flattens the spikes  

Cache miss + WAN problem 

LDNS 

CoDNS

 

2/4/08  

OSDI '04  

24  

Availability 

Adds one 0000 from 99% to 99.9%  

9% 

90% 

99% 

99.9% 

99.99% 

10 

20 

30 

40 

50 

60 

70 

80 

90 

Nodes Sorted By LDNS Availability 

Availability(%) 

CoDNS 

LDNS

 

2/4/08  

OSDI '04  

25  

What About CDNs? 

CDN uses DNS to pick 00est00replica

CoDNS used only when LDNS failing 

Pro: faster lookup time Con: maybe worse/farther replica In reality, peer00 answer is better 30% of the time  

2/4/08  

OSDI '04  

26  

CDN Pro/Con Measurements

 

2/4/08  

OSDI '04  

27  

Overhead 

Heartbeat packet: 1/sec, Memory: 600KB Remote queries: median 25% more lookups  

2/4/08  

OSDI '04  

28  

CoDNS Alternatives 

In the paper:

Private Nameservers Secondary Nameservers TCP Queries  

2/4/08  

OSDI '04  

29  

Conclusion 

Local failures relatively frequent Failure time dominates latency CoDNS provides low-cost 00nsurance00 service Masks local failures Reduces avg response time 27-82% Improves availability by additional 0000/font> Incrementally deployable, no server change  

2/4/08  

OSDI '04  

30  

More Information 

CoDNS homepage:

  http://codeen.cs.princeton.edu/codns/

Email:

  princeton_codeen@slices.planet-lab.org 

 

2/4/08  

OSDI '04  

31  

TCP Queries 

DNS support TCP Failure rate is better Not used exept for AFXR or when answer is big Simple TCP 2 packets vs. 9 packets (3+2+4 =9) Persistent TCP ACK overhead Resource waste for Idle connections Vulnerable to overloading/server down  

2/4/08  

OSDI '04  

32  

S-TCP,P-TCP,UDP, CoDNS 

Replay test(10792 names) on 107 nodes CoDNS First  

2/4/08  

OSDI '04  

33  

CoDNS vs. Persistent TCP 

Average Response Time (ms)

 

2/4/08  

OSDI '04  

34  

Lookup Distribution 

Live traffic on a node for one week (20333 queries) 2043135 ms / 5809265 ms = 35.1% 100 ms vs. 286 ms per query Great improvement on W-CDF  

5.5% 00/font> 0.06% 

76% 00/font> 17.8%

 

2/4/08  

OSDI '04  

35  

Analysis on Wins 

80% at first query, 95% at second query 

Percentage

   download CoDNS: Improving DNS Performance and Reliability via Cooperative ...

Responses to CoDNS: Improving DNS Performance and Reliability via Cooperative ...

It's no comment...

 

Your Name:
Your Email:
Your Talk: