|
Home > Archive > Unix administration > May 2005 > Unexplained lag from clients after netmask change - advice solicited
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Unexplained lag from clients after netmask change - advice solicited
|
|
| Frank J. Perricone 2005-05-25, 6:02 pm |
| Legacy Digital Unix 4.0d system running on an AlphaStation 1000 in a fairly
small shop, about 30 local users, a few Windows 2000 servers. Very simple
network topology with unmanaged hubs.
Until recently, most of our clients used fixed IP addresses. DNS was served
by the BIND daemon on the Unix box. We had a handful of IPs allocated to
DHCP, and DHCP services were provided by one of the Windows 2000 servers.
Our IP range was ???.???.???.0-127 so our subnet mask was 255.255.255.128.
We were given the other half of that range so we could use addresses
128-255, so one weekend we switched over the subnet masks on the servers to
255.255.255.0. Since we had to go to most of our clients to update them, we
decided this would be a good time to switch over to letting all clients use
DHCP so at least this would be the last time we had to do it this way.
Preparatory to this we set up DNS entries for addresses 130-250, each named
something like "dhcp130.dlc.state.vt.us" et al.
After the cutover we found people who were getting the high addresses from
DHCP weren't able to get to the Unix system, and concluded the subnet mask
change (made via netsetup, with Internet services restarted afterwards)
hadn't taken, so we rebooted the Unix system and this time it took.
However, ever since then, we have had intermittent bursts of annoying lag in
certain situations with communications with the Unix system.
This is most notable in telnet sessions to the Unix system, which is where
most of the traffic happens since this system is running legacy software
that is accessed that way. But we also see it in FTP as well as the client
for the queueing system we use (QMaster) so it's not limited to telnet.
What'll happen is, you're typing something in the terminal window
(Pathworks, generally) and it'll hang up on you, and then after a few
seconds, it'll all go through, or at least most of it, depending on if you
were typing faster than the client buffer could handle. In fact, if you
type enough it could even kick you out.
Here's an odd observation. I was typing "ping pineapple" to ask the Unix
box to ping one of the servers (the one that does DHCP). It did that
freeze-up just before the final "e" at the end of the line. About five
seconds went by, and then suddenly the screen jumped up and I had the
command and the first four or five ping results already. Which means that
the Unix box *had* received the command and was processing it the whole
time, doing the ping and getting back results, but just not communicating
this back to my terminal window.
At one point a reboot seemed to clear this up but it came back... perhaps
gradually getting worse, but it's hard to gauge accurately, no easy way to
measure this. Note that so far as I know, nothing else has changed,
certainly nothing physical. All we changed is the subnet mask on the Unix
box, and the clients are connecting from new, mostly high-range, IP
addresses. There's no change in CPU load, no new network traffic, no other
network congestion problems between any other computers, nothing I can point
to or know to look at. I've searched my system for any other references to
the old subnet mask in case something else needed to be changed and there
aren't any.
I am running out of ideas for where to look to track this down or fix it.
I'd be happy for ideas of directions to check, or pointers to software that
might help me find it, or really anything. If there's any details about
this that I omitted that might be needed, just ask.
| |
| Barry Margolin 2005-05-25, 8:53 pm |
| In article <Z5GdnfnuGtzu4gnfRVn-uQ@telcove.net>,
"Frank J. Perricone" <frank@dlc.state.vt.us> wrote:
> I am running out of ideas for where to look to track this down or fix it.
> I'd be happy for ideas of directions to check, or pointers to software that
> might help me find it, or really anything. If there's any details about
> this that I omitted that might be needed, just ask.
One of the thoughts that came to mind was that ARP entries are timing
out.
Run tcpdump to capture the network traffic on the server when this
happens.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
| |
| Frank J. Perricone 2005-05-26, 6:00 pm |
| An update:
I restored a copy of rc.config from before the switchover and then ran diff
against it. Turns out netsetup changed four things beyond the one I told it
to change, one of which seems potentially significant. I've changed them
back and rebooted and we'll have to see if the problem begins to crop up
again or not. The four changes are:
IFCONFIG: used to have "-speed 100" on the end of the line, which netsetup
took out without a warning. We had had a problem with the network card not
getting along with the automatic 10/100 distinguishing on the hub some years
ago, and the fix involved this switch and using a port on a 100-only hub.
RWHOD: changed from yes to no.
GATED: changed from yes to no.
GATED_OLD: changed from yes to no. Maybe I misread some of netsetup's
prompts, so these are probably my fault.
|
|
|
|
|