 |
|
 |
|
|
 |
NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-23-04 03:33 AM
This has me stumped. My apologies if the subject line misses the
mark.
I've recently migrated some directories from one NFS file server
(SunOS 5.6) to another (IRIX 6.5.13m) using "cp -Rp" as root from an
NFS client (IRIX 6.5.19m).
Before the migration, the NFS client did this:
/usr/local -> /net/sun-host/mnt/usr/local
After the migration, it does this:
/usr/local -> /net/irix-host/mnt/usr/local
Everything seems fine, except for one file:
/usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
The symptom is that netscape mail's "Spelling" tool is unavailable.
I've traced the spelling problem to that particular file.
I've compared the files on the SunOS and IRIX servers with diff and
cmp, as a regular user (no special privileges), and they appear to
be identical. Neither diff nor cmp have any trouble reading either
file. Ownership, sizes, dates, permissions, everything I can think
of to compare is identical, except for their physical locations on
disk. In trying to debug this, I've found the following:
* Only netscape seems to have any trouble accessing the file.
Other apps (diff, cmp, strings, cp) have no trouble reading it.
* If I replace the file on the IRIX server with a symlink back to
the SunOS server, it works in netscape:
pen4s324.dat -> /net/sun-host/mnt/usr/local/netscape_4.79_irix6.5/spell/pen4
s324.dat
* If I replace it with a symlink to another duplicate of the file
on the NFS client's internal disk, it works in netscape (only on
that host, of course):
pen4s324.dat -> /usr/local-on-nfs-client/pen4s324.dat
* If I make another duplicate of the file, or even the entire
netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
work. Everything else in netscape seems to run fine, except for
the "Spelling" tool.
* If I log-in to the IRIX server and run netscape there (with
DISPLAY pointing back to the NFS client), THAT works. It avoids
NFS and accesses the file locally:
/usr/local -> /mnt/usr/local
I've rebooted both the IRIX NFS server and the NFS client, but this
behavior persists. It's not tied to netscape's pathname to the file
(since the pathname works if it's a symlink to the SunOS server).
It's not tied to the file's inode (since copying the file to a
different inode doesn't change anything). It's not tied to the
file's contents (since it works as long as the app is running on the
NFS server rather than the NFS client).
The last point seems to indicate some sort of NFS problem, but as I
said, I've rebooted both machines to no avail. I'd appreciate any
tips.
--
Ted Hall
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-23-04 09:34 AM
"Theodore W. Hall" <twhall@cuhk.edu.hk> writes:
>This has me stumped. My apologies if the subject line misses the
>mark.
>...
>Everything seems fine, except for one file:
> /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
>...
>* If I make another duplicate of the file, or even the entire
> netscape_4.79_irix6.5 directory, on the IRIX server, it doesn't
> work. Everything else in netscape seems to run fine, except for
> the "Spelling" tool.
I wonder if the pen4s324.dat might be a "sparse file" which may not
have gotten copied correctly onto the new server. Try running a local
and nfs mounted cksum on both files and see if there is a checksum
difference. Along the same line of reasoning if the file did get
copied OK but is indeed a sparse file there may be some sort of nfs
bug that you are running into related to sparse file handling.
Good Luck!
Mark Hittinger
bugs@pu.net
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-23-04 02:34 PM
In article <eKOdnVU3rrwIqafdRVn_vA@comcast.com>,
bugs@pu.net (Mark Hittinger) wrote:
> "Theodore W. Hall" <twhall@cuhk.edu.hk> writes:
>
> I wonder if the pen4s324.dat might be a "sparse file" which may not
> have gotten copied correctly onto the new server. Try running a local
> and nfs mounted cksum on both files and see if there is a checksum
> difference. Along the same line of reasoning if the file did get
> copied OK but is indeed a sparse file there may be some sort of nfs
> bug that you are running into related to sparse file handling.
He said he already compared them using "cmp" and "diff"; I can't imagine
how checksum would detect a difference that these didn't. Sparseness is
transparent to user-level applications.
I suggest the OP use truss on the client to see what it's doing when
Netscape hangs.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-24-04 05:34 AM
Theodore W. Hall wrote:
>
> I've recently migrated some directories from one NFS file server
> (SunOS 5.6) to another (IRIX 6.5.13m) using "cp -Rp" as root from an
> NFS client (IRIX 6.5.19m).
cp isn't a good tool to copy directory trees from one machine to
another.
> /usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat
>
> The symptom is that netscape mail's "Spelling" tool is unavailable.
> I've traced the spelling problem to that particular file.
>
> I've compared the files on the SunOS and IRIX servers with diff and
> cmp, as a regular user (no special privileges), and they appear to
> be identical. Neither diff nor cmp have any trouble reading either
> file. Ownership, sizes, dates, permissions, everything I can think
> of to compare is identical, except for their physical locations on
> disk.
Next check the UID and GID numbers that give the names you see.
Check them on both machines. The ownership could still be wrong.
Finally there's a topic that comes up a lot in tape drives that may
apply. IRIX uses little-endian and Solaris uses big-endian (or is
it the other way around, anyways they are different). Reading a file
locally won't do htonl() and ntohl() mapping of binary files. Reading
a file over NFS should do network order mapping. I suspect it is a
binary file and the XDR layer of NFS broke its byte order.
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-24-04 06:34 AM
Barry Margolin wrote:
> I suggest the OP use truss on the client to see what it's doing
> when netscape hangs.
That was revealing. "man -k truss" on IRIX led me to /usr/sbin/par,
which reports an error of "No locks available":
open("/usr/local/netscape_4.79_irix6.5/spell/pen4s324.dat", O_RDONLY, 0777)
= 28
fcntl(28, F_GETLK, 0x7ffefba0) errno = 46 (No locks available)
close(28) OK
close(28) errno = 9 (Bad file number)
Moreover, when I shuffle things to get around that (as I described
in my original post), I get a similar "No locks available" error on
a couple of other spelling related files -- netscape.dic and
${HOME}/.netscape/custom.dic
open("/usr/local/netscape_4.79_irix6.5/spell/netscape.dic", O_RDONLY, 0777)
= 29
fcntl(29, F_GETLK, 0x7ffef950) errno = 46 (No locks available)
close(29)
END-close() OK
..
open("/usr/people/hall/.netscape/custom.dic", O_RDONLY, 0777) = 29
fcntl(29, F_GETLK, 0x7ffef950) errno = 46 (No locks available)
close(29) OK
Apparently, if "pen4s324.dat" fails, then netscape doesn't even try
to open the custom dictionaries. If "pen4s324.dat" succeeds, then
the Spelling tool is available even though it fails to open the
custom dictionaries. ("pen4s324.dat" seems to be the main
dictionary, in some binary format. "netscape.dic" is a text file
with only a few entries such as "Netscape", "HTML", "browser",
"Collabra", "applets", ... not essential to the tool.)
I've rebooted the NFS client again, but it made no difference. So
the next question is: Why are there no locks available for these
few files? Everything else seems to be fine.
--
Ted Hall
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-24-04 07:34 AM
Mark Hittinger wrote:
> I wonder if the pen4s324.dat might be a "sparse file" which may not
> have gotten copied correctly onto the new server. Try running a local
> and nfs mounted cksum on both files and see if there is a checksum
> difference. Along the same line of reasoning if the file did get
> copied OK but is indeed a sparse file there may be some sort of nfs
> bug that you are running into related to sparse file handling.
>
> Good Luck!
Thanks. I tried /usr/bin/cksum on both NFS servers and the NFS
client and got identical results on all three hosts.
I've also discovered via /usr/sbin/par that a similar error is
occurring on a plain text file of just 144 bytes (netscape.dic), so
it doesn't seem to be related to "sparsity". I didn't notice this
error previously since the only "damage" is the loss of a handful
of custom dictionary entries.
--
Ted Hall
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-24-04 08:34 AM
Doug Freyburger wrote:
> Next check the UID and GID numbers that give the names you see.
> Check them on both machines. The ownership could still be wrong.
UID = 0, GID = 0, mode = -rw-r--r--
> Finally there's a topic that comes up a lot in tape drives that may
> apply. IRIX uses little-endian and Solaris uses big-endian (or is
> it the other way around, anyways they are different). Reading a
> file locally won't do htonl() and ntohl() mapping of binary files.
> Reading a file over NFS should do network order mapping. I suspect
> it is a binary file and the XDR layer of NFS broke its byte order.
Mmm, in my experience, IRIX-MIPS and SunOS-SPARC are both big-endian.
Anyay, /usr/sbin/par reveals that I'm getting a similar error
"No locks available" on a 144-byte plain text file, "netscape.dic".
Thanks for your suggestions. I've been bitten by byte order before,
but not this time.
--
Ted Hall
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-25-04 02:39 PM
In article <403B697A.96099D1E@cuhk.edu.hk>,
"Theodore W. Hall" <twhall@cuhk.edu.hk> wrote:
> That was revealing. "man -k truss" on IRIX led me to /usr/sbin/par,
> which reports an error of "No locks available":
Sounds like there's a problem with file locking on the server you copied
the files to.
--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
 |
Re: NFS file problem: maybe a stale/stuck handle? |
 |
 |
|
|
02-26-04 09:34 AM
In article <403B697A.96099D1E@cuhk.edu.hk>, I wrote:
> That was revealing. "man -k truss" on IRIX led me to
> /usr/sbin/par, which reports an error of "No locks available":
Barry Margolin wrote:
> Sounds like there's a problem with file locking on the server you
> copied the files to.
Eureka! I've just discovered that the SGI NFS server isn't
running lockd -- it's a separate switch from nfsd. nfsd is on,
but lockd is off. Urggh ... live and learn ... I'm obviously
an amateur at this.
I can't reboot now, but I'm confident that starting lockd on the
next reboot will clear this up. If it doesn't, I'll be back ...
--
Ted Hall
[ Post a follow-up to this message ]
|
|
|
 |
|
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 07:19 AM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|