|
Home > Archive > Unix Programming > March 2007 > My own DNS resolver
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
My own DNS resolver
|
|
| chsalvia@gmail.com 2007-03-29, 7:19 am |
| I've been studying how DNS works, and I'm trying to write my own DNS
resolver. I read some websites about the protocol, and looked at the
source code of a few DNS clients. So far, I've made pretty good
progress, but I'm not understanding something about the answer I get
from a DNS server.
I do a DNS lookup on "www.google.com", and store the answer in a C
struct defined like this:
struct dns_response {
unsigned char* name;
uint16_t rtype;
uint16_t rclass;
uint32_t ttl;
uint16_t rdlength;
byte* data;
};
I assign name to the 12th byte in the response buffer, (to skip over
the header field), and when I print out that memory region I get
exactly what I expect: the host name "www.google.com" delimited with
length labels. I assign rclass and rtype to the octet doubles
following the NULL char after the host name, and again I get what I
expect when I print out their values.
But after that, I'm a bit confused. According to the literature on
the DNS protocol, the rclass field is followed by a 4-byte time limit
field, which in turn is followed by a 2 byte length field. The length
field should be 4, (the length of an IP address v4), but instead I
find that the length is 1. In fact, when I print out a byte stream of
the DNS response, I find that Google's IP address appears far down the
line. But I can't make any sense out of the response. The byte
stream (following the host name) is like so:
0 1 0 1 192 12 0 5 0 1 0 0 49 88 0 8 3 119 119 119 1 108 192 16 192 44
0 1 0 1 0 0 0 135 0 4 64 233 161 147
As you can see, Google's IP address is all the way at the end
(64.233.161.147), but according to the DNS protocol, it should be the
10th byte in the above stream, right after the rdlength field. So
what's going on here?
| |
| Rainer Weikusat 2007-03-29, 7:19 am |
| chsalvia@gmail.com writes:
> I've been studying how DNS works, and I'm trying to write my own DNS
> resolver. I read some websites about the protocol, and looked at the
> source code of a few DNS clients. So far, I've made pretty good
> progress, but I'm not understanding something about the answer I get
> from a DNS server.
>
> I do a DNS lookup on "www.google.com", and store the answer in a C
> struct defined like this:
>
> struct dns_response {
> unsigned char* name;
> uint16_t rtype;
> uint16_t rclass;
> uint32_t ttl;
> uint16_t rdlength;
> byte* data;
> };
>
> I assign name to the 12th byte in the response buffer, (to skip over
> the header field),
[...]
> 0 1 0 1 192 12 0 5 0 1 0 0 49 88 0 8 3 119 119 119 1 108 192 16 192 44
> 0 1 0 1 0 0 0 135 0 4 64 233 161 147
>
> As you can see, Google's IP address is all the way at the end
> (64.233.161.147), but according to the DNS protocol, it should be the
> 10th byte in the above stream, right after the rdlength field. So
> what's going on here?
Below is the general format of a DNS message:
+---------------------+
| Header |
+---------------------+
@12 | Question | the question for the name server
+---------------------+
| Answer | RRs answering the question
+---------------------+
| Authority | RRs pointing toward an authority
+---------------------+
| Additional | RRs holding additional information
+---------------------+
You are parsing the question section, which contains the name you
asked for (www.google.com), followed by the type ('A' ::= 1)
and class ('IN' ::= 1) of the query. After this comes the answer
section. The first RR starts with 0xc00c. This is a pointer, as
indicated by the fact that the two highest bits are set. This means
that the name for the this RR can be found at offset 12 in the message
('www.google.com', in the question section. After this comes a type
('CNAME' ::= 5) and a class ('IN' ::= 1). Then a four byte TTL.
Then the length of the rdata section of the CNAME record
(8). Afterwards the 'canonical name' which is www.l.google.com, again
refering back to the question section for the common suffix of both
names. After this comes an A RR, starting with a back reference to the
name of the last ('192 44' ::= offset 44 from start of the message,
followed by type, class and TTL, then the rdlength of 4 and finally,
(one of) the IPs associated with www.l.google.com.
As hexdump (first column is offset):
00000000 00010001 c00c0005 00010000 31580008 03777777 ............1X...www
00000014 016cc010 c02c0001 00010000 00870004 40e9a193 .l...,..........@...
| |
| chsalvia@gmail.com 2007-03-29, 7:18 pm |
| I see. I was confused because I didn't realize that the question
field which I send to the DNS server is also included in the response.
| |
| chsalvia@gmail.com 2007-03-29, 7:18 pm |
| In fact - what is the point of returning the Question in the DNS
response? The header field already includes an ID number that allows
you to match up queries, so what's the point of sending the Question
back to the client?
| |
| Rick Jones 2007-03-29, 7:18 pm |
| chsalvia@gmail.com wrote:
> In fact - what is the point of returning the Question in the DNS
> response? The header field already includes an ID number that
> allows you to match up queries, so what's the point of sending the
> Question back to the client?
Belt and suspenders perhaps. I would think one or more of the DNS
RFCs would give reasons. If the ID is only 16 bits, it wouldn't
necessarily take all that long to wrap around, and repeating the
question would be a good way to help further protect from a delayed
duplicate of a previous answer - since applications using UDP must be
prepared for their traffic to be lost, duplicated, delayed, bent,
folded, spindled and mutilated...
rick jones
--
portable adj, code that compiles under more than one compiler
these opinions are mine, all mine; HP might not want them anyway... 
feel free to post, OR email to rick.jones2 in hp.com but NOT BOTH...
| |
| chsalvia@gmail.com 2007-03-30, 1:18 am |
| Well, here's my attempt at a minimally functional DNS response parsing
function. It seems to be working good. I just skip over the question
section in the DNS response because there's really no point in parsing
it. I suppose it might be a good idea to memcmp() it with the
original question to make sure the response wasn't garbled in
transmission.
typedef struct {
uint16_t id;
uint16_t flags;
uint16_t qdcount;
uint16_t ancount;
uint16_t nscount;
uint16_t arcount;
unsigned char question[512];
} dns_header;
typedef struct {
unsigned char* name;
uint16_t rtype;
uint16_t rclass;
uint32_t ttl;
uint16_t rdlength;
byte* data;
} dns_response;
typedef union {
struct in6_addr v6; // prepare for the future...
struct in_addr v4;
} ip_addr_union;
typedef struct {
union ip_addr_union* ip_addr;
uint16_t count;
} host_addr ;
#define DNS_ISPTR(val) ((val & 0xC0) == 0xC0)
int parse_response(unsigned char* buffer, ssize_t size, uint16_t
request_length, host_addr* h_addr)
{
/* buffer contains DNS response */
const unsigned char* end = buffer + size;
dns_header resp_header;
memset(&resp_header, 0, sizeof(resp_header));
if (size < request_length + 12) return -1;
/* fill response header */
resp_header.id = *((uint16_t*)buffer);
resp_header.flags = ntohs (*((uint16_t*)buffer+1));
resp_header.qdcount = ntohs(*((uint16_t*)buffer+2));
resp_header.ancount = ntohs(*((uint16_t*)buffer+3));
resp_header.nscount = ntohs(*((uint16_t*)buffer+4));
resp_header.arcount = ntohs(*((uint16_t*)buffer+5));
if ( !FLAG_MASK_QR(resp_header.flags) ||
FLAG_MASK_OPCODE(resp_header.flags) != 0 ||
FLAG_MASK_RCODE(resp_header.flags) != 0)
return -1;
dns_response answer;
answer.name = buffer + request_length + 12;
register unsigned char* p = answer.name;
register uint16_t len, p_offset;
h_addr->ip_addr = (ip_addr_union*) malloc(resp_header.ancount *
sizeof(ip_addr_union));
h_addr->count = 0;
for (register unsigned i = 0; i < resp_header.ancount; ++i) {
/* label or pointer */
for (; p < end;) {
if (DNS_ISPTR(*p)) { // skip pointer
p += 2;
break;
}
else if (*p == 0) {
++p;
break;
}
else p += *p + 1;
}
if (p + 10 >= end) return -1;
memcpy(&answer.rtype, p, 2);
memcpy(&answer.rclass, p+2, 2);
memcpy(&answer.ttl, p+4, 4);
memcpy(&answer.rdlength, p+8, 2);
answer.rtype = ntohs(answer.rtype);
answer.rclass = ntohs(answer.rclass);
answer.ttl = ntohl(answer.ttl);
answer.rdlength = ntohs(answer.rdlength);
p += 10;
if (p + answer.rdlength > end) return -1;
switch(answer.rtype) {
case DNS_QTYPE_A: {
if (answer.rdlength == 4)
memcpy(&h_addr->ip_addr[h_addr->count++].v4, p, answer.rdlength);
break;
}
case DNS_QTYPE_AAAA: {
if (answer.rdlength == 16)
memcpy(&h_addr->ip_addr[h_addr->count++].v6, p,
answer.rdlength);
break;
}
case DNS_QTYPE_CNAME:
default: break;
}
p += answer.rdlength;
}
return 0;
}
| |
| Rainer Weikusat 2007-03-30, 7:17 am |
| chsalvia@gmail.com writes:
> In fact - what is the point of returning the Question in the DNS
> response? The header field already includes an ID number that allows
> you to match up queries, so what's the point of sending the Question
> back to the client?
I do not know about the 'theoretical reason' for this, but an obvious
practical one would be that a server preparing a UDP-reply can then
just reuse the buffer it already used to receive the request by changing
the necessary fields in the header and appending the RR-sets it wants
to send.
| |
| chsalvia@gmail.com 2007-03-30, 7:17 am |
| I've noticed that a lot of DNS queries don't even return answer fields
with IP addresses, but only name servers or authorities. For example,
when I send a query to my university's domain, (www.uhmc.sunysb.edu),
the DNS server sends me a lot of authorities and additional fields,
but no answer fields.
I get:
| NAME: uhmc.sunysb.edu
| TYPE: 2 CLASS: 1 TTL: 6687 LENGTH: 20
| DATA: nocnoc.stonybrook.edu
| NAME: uhmc.sunysb.edu
| TYPE: 2 CLASS: 1 TTL: 6687 LENGTH: 13
| DATA: infoblox-2.uhmc.sunysb.edu
How can I resolve this query? It doesn't even include an internet IP
address, only name servers. A query to www.coke.com is even less
informative. The DNS server sends me no answer fields, and various
authority fields like:
| NAME: com
| TYPE: 2 CLASS: 1 TTL: 13276 LENGTH: 4
| DATA: K.GTLD-SERVERS.NET
But the UNIX system call gethostbyname() has no difficulty resolving
these host names, so there must be more to DNS lookups then I
currently understand.
| |
| Rainer Weikusat 2007-03-30, 7:17 am |
| chsalvia@gmail.com writes:
> I've noticed that a lot of DNS queries don't even return answer fields
> with IP addresses, but only name servers or authorities. For example,
> when I send a query to my university's domain, (www.uhmc.sunysb.edu),
> the DNS server sends me a lot of authorities and additional fields,
> but no answer fields.
>
> I get:
>
> | NAME: uhmc.sunysb.edu
> | TYPE: 2 CLASS: 1 TTL: 6687 LENGTH: 20
> | DATA: nocnoc.stonybrook.edu
>
> | NAME: uhmc.sunysb.edu
> | TYPE: 2 CLASS: 1 TTL: 6687 LENGTH: 13
> | DATA: infoblox-2.uhmc.sunysb.edu
>
> How can I resolve this query? It doesn't even include an internet IP
> address, only name servers. A query to www.coke.com is even less
> informative. The DNS server sends me no answer fields, and various
> authority fields like:
>
> | NAME: com
> | TYPE: 2 CLASS: 1 TTL: 13276 LENGTH: 4
> | DATA: K.GTLD-SERVERS.NET
>
> But the UNIX system call gethostbyname() has no difficulty resolving
> these host names, so there must be more to DNS lookups then I
> currently understand.
There are two ways to resolve a DNS query:
a) Do a query with the 'recursion desired' bit set in the
header to a resolver that is willing to act as recursive
resolver. This resolver will do whatever is necessary to
get the final answer and send that to the client.
b) Iterative resolution: If the initial query does not return
a final answer, ask the name server(s) listed as authoritative
(this is called 'a referral'). Proceed with referral
processing until the final answer is known. Resolve nameserver
names to nameserver IPs as needed, using the same algorithm.
| |
| chsalvia@gmail.com 2007-03-30, 7:19 pm |
| So basically, the algorithm is something like:
Query DNS Server
If ANSWER, return IP
If NO ANSWER, return list of AUTHORITIES (name servers)
Query DNS Server for IP of each NAME SERVER
Query each NAME SERVER for host IP
| |
| Bjorn Reese 2007-03-31, 7:20 am |
| chsalvia@gmail.com wrote:
> So basically, the algorithm is something like:
I suggest that you read RFC 1034 section 4.3.1 and RFC 1536 section 2.
--
mail1dotstofanetdotdk
|
|
|
|
|