Unix administration - suddenly filesystem becomes read-only?

This is Interesting: Free IT Magazines  
Home > Archive > Unix administration > October 2006 > suddenly filesystem becomes read-only?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author suddenly filesystem becomes read-only?
Troy Piggins

2006-10-03, 7:29 pm

What could cause my filesystem to become read-only?

The system uptime was around 3 weeks and running fine, then this
morning slrn crashed because of a lockfile problem. I tried to
delete it but it couldn't because read-only file. Checked the
permissions of the file and they were rw------- troy.troy.

Then had trouble starting vim, presumably because the swap files
couldn't be created. I tried to create a file in my home
directory and couldn't, got a 'read-only filesystem' error.

I restarted this morning around 7am and at least now I can do
things.

Any ideas on where to start forensics? Looked in syslog - the
last entries were around 1am and it was running a cronjob of
fetchnews (part of leafnode2).

--
Troy Piggins
,-o
o ) Ubuntu linux 6.06 http://ubuntu.com RLU#415538 http://counter.li.org
`-o uptime: 07:44:35 up 39 min,2 users,load average:0.00,0.00,0.00
Mark Hittinger

2006-10-04, 1:39 am

Troy Piggins <usenet-0610@piggo.com> writes:
>What could cause my filesystem to become read-only?


If the operating system detected a bad spot on your hard drive it might have
changed the file system to read-only in order to preserve data.

Check /var/log/messages to see if you can find hard drive errors. grep for
'error=' might turn up something.

Later

Mark Hittinger
bugs@pu.net
Troy Piggins

2006-10-04, 1:39 am

* Mark Hittinger wrote:
> Troy Piggins <usenet-0610@piggo.com> writes:
>
> If the operating system detected a bad spot on your hard drive
> it might have changed the file system to read-only in order to
> preserve data.
>
> Check /var/log/messages to see if you can find hard drive
> errors. grep for 'error=' might turn up something.


Checked /var/log/messages and grepped for 'error' and nothing
came up.

Thanks for the suggestion, still looking for solution though.

--
Troy Piggins
,-o
o ) Ubuntu linux 6.06 http://ubuntu.com RLU#415538 http://counter.li.org
`-o uptime: 14:41:33 up 7:36,2 users,load average:0.00,0.01,0.00
Michael Paoli

2006-10-04, 7:35 am

Troy Piggins wrote:
> * Mark Hittinger wrote:
>
> Checked /var/log/messages and grepped for 'error' and nothing
> came up.
>
> Thanks for the suggestion, still looking for solution though.


If /var/log/messages is on the ro filesystem, then looking in
/var/log/messages might not be useful. Check the output of dmesg,
and perhaps also any other locations syslog or similar facilities may
still be able to write. Checking for 'error' string may not suffice
- might be advisable to at least look for case insensitive matches to
strings such as error, fail, disk, block, etc. Some filesystem mounts
may have options such as errors=remount-ro or may default to such
behavior. Some may have mount options such as errors=panic or may
default to such behavior.

Troy Piggins

2006-10-04, 7:35 am

* Michael Paoli wrote:
> Troy Piggins wrote:
>
> If /var/log/messages is on the ro filesystem, then looking in
> /var/log/messages might not be useful. Check the output of dmesg,


Aah, of course!

> and perhaps also any other locations syslog or similar facilities may


I restarted, so I guess it's all gone and I'll need to wait til
next time it happens. At least you've hinted at why I can't find
the problem.

> still be able to write. Checking for 'error' string may not suffice
> - might be advisable to at least look for case insensitive matches to
> strings such as error, fail, disk, block, etc. Some filesystem mounts


I grepped for those in dmesg, syslog, and messages and no joy.

> may have options such as errors=remount-ro or may default to such
> behavior. Some may have mount options such as errors=panic or may


Yes! Ubuntu's /etc/fstab default seems to be:

/dev/hda1 / ext3 defaults,errors=remount-ro 0 1

--
Troy Piggins
,-o
o ) Ubuntu linux 6.06 http://ubuntu.com RLU#415538 http://counter.li.org
`-o uptime: 17:19:06 up 10:14,2 users,load average:0.00,0.00,0.00
Stefaan A Eeckels

2006-10-04, 7:35 am

On Wed, 04 Oct 2006 07:55:46 +1000
Troy Piggins <usenet-0610@piggo.com> wrote:

> What could cause my filesystem to become read-only?


Check the mount options on the file system. Some file systems (like
Sun's UFS) have a mount option that determines how they handle
errors and inconsistencies. The default action is to remount the file
system in read-only mode. This might be the same for Ubuntu (which I
have zero experience with). The "mount" command (without parameters)
will show the options with which your file systems are mounted.

--
Stefaan A Eeckels
--
Isn't it amazing how a large number of evil morons can give the
appearance of being a single evil genius? --Mel Rimmer
Logan Shaw

2006-10-06, 1:49 am

Troy Piggins wrote:
> * Michael Paoli wrote:


[vbcol=seagreen]
> Yes! Ubuntu's /etc/fstab default seems to be:
>
> /dev/hda1 / ext3 defaults,errors=remount-ro 0 1


If this is in fact the reason the filesystem became read-only (which
it seems like it probably is), then you really need to do some
careful investigation, provided this machine is important to you.
At this point, there are only two possible causes:

(1) software problem, 99% chance it's a bug in the filesystem code
(2) hardware problem, which means either defective memory (unlikely)
or dying hard disk.

If I had to bet money, I'd say this was most likely due to a defective
hard disk. So, I would check your disk with whatever tool Linux provides
for checking for bad sectors. You could even just do something like a
"dd if=/dev/hda1 of=/dev/null bs=1024k" just to be sure you can read
all the sectors. Checking that you can read and write them both would
be a better test, though.

- Logan
Troy Piggins

2006-10-06, 1:49 am

* Logan Shaw wrote:
> Troy Piggins wrote:
>
>
>
> If this is in fact the reason the filesystem became read-only (which
> it seems like it probably is), then you really need to do some
> careful investigation, provided this machine is important to you.


Put it this way, my girlfriend thinks I love my linux box more
than her...

> At this point, there are only two possible causes:
>
> (1) software problem, 99% chance it's a bug in the filesystem code
> (2) hardware problem, which means either defective memory (unlikely)
> or dying hard disk.


Yep, I'm thinking (2) also.

> If I had to bet money, I'd say this was most likely due to a defective
> hard disk. So, I would check your disk with whatever tool Linux provides
> for checking for bad sectors. You could even just do something like a
> "dd if=/dev/hda1 of=/dev/null bs=1024k" just to be sure you can read
> all the sectors. Checking that you can read and write them both would
> be a better test, though.


I'll do some checks on the weekend. Thanks for the pointers.

--
Troy Piggins
,-o
o ) Ubuntu linux 6.06 http://ubuntu.com RLU#415538 http://counter.li.org
`-o uptime: 13:32:08 up 2 days,6:27,2 users,load average:0.00,0.00,0.00
Michael Paoli

2006-10-07, 1:22 pm

Troy Piggins wrote:
> * Logan Shaw wrote:
[vbcol=seagreen]
> Put it this way, my girlfriend thinks I love my linux box more
> than her...

I think that's a common (natural?) occurrence between girlfriends and
LINUX ;-) ... but that's probably a topic for some other newsgroup.

> Yep, I'm thinking (2) also.

Well, I'd also lump power distruptions into (2), or in some cases (1),
depending upon their cause, and they could be completely external
to the system itself that's had the filesystem issue - nevertheless
such could introduce logical data corruption and/or hardware damage
to the disk, or otherwise have negative impacts upon the computer
system.

>
> I'll do some checks on the weekend. Thanks for the pointers.


Yes, that's one of the first things I'd also do if I suspected there
might be disk hardware problems - read the entire device end-to-end,
and see if that is successful or not. Note also that more modern
and/or intelligent hard drive (e.g. SCSI, most non-ancient IDE/ATA,
etc.) drives are relatively intelligent about automagically "fixing"
minor hard drive problems. Within my experience, on SCSI this is
typically much more graceful than with IDE/ATA, but others may have
had different experiences (thus far I've not dealt with a particular
large number of drives that automagically recovered themselves, so
I'm working from a small number statistics sample set). With SCSI,
there's a "grown defects list" (or whatever its precise name is).
When bad/suspect sectors are found, they're added to this list. If
the drive is still able to read the data (if it's read, before there
is some attempt to write it), it will rewrite it elsewhere, and remap
so the alternate sector is used. If it's simply written to, it
likewise remaps it, and writes and henceforward uses the alternate
sector. Things only go rather to quite poorly with this scheme when
either the operating system still needs to read the data, and the
SCSI drive can't successfully read it, or the "grown defects list"
table overflows, and the SCSI drive can no longer remap bad sectors
(if you or your monitoring software can monitor the "grown defects
list", watching for growth there, particularly if it's growing fast,
or the table is approaching being full, those are strong indicators of
a disk that is quite probable to non-recoverablely fail in the near
future). With IDE/ATA I've seen similar, but less graceful behavior.
With such drives, it seems (I've not verified this at all, ... just
my guestimate on behaviors I've seen) the drives aren't as
"proactive" about remapping. It seems they only get around to
remapping after a sector has gotten to the point where it can't be
successfully read. On the other hand, SCSI seems a bit more
intelligent about this, and is often capable of detecting that
sectors are becoming "difficult" (perhaps close to tolerance limits,
or experiencing some read errors, but succeed with repeated read
retrys) to read, and often successfully remap them (with no visible
sign that any problems occurred, other than the growth of the defects
list, and perhaps a trace of extra latency in reads on some
occasions). Anyway, even with the "smart", but not *as* "smart"
IDE/ATA drives, overwriting the sector of the device that's having
the problem will often cause the problem to automagically
"disappear", as it gets remapped upon the overwrite (e.g. my personal
laptop has given this precise behavior exactly twice thus far in the
over 3 years that I've had this laptop). Note however, that for many
filesystem types, that overwriting the file that contains the bad
sector may not attempt to overwrite the bad sector - e.g. journaling
filesystems will typically write the data elsewhere upon "overwrite"
(so that an incomplete action - such as one disrupted by loss of
power or system lock-up, can be "rolled back" (or forward) to a
consistent filesystem state.

As was mentioned (or at least hinted at) earlier, if you're able to
do repeated overwrites of various patters, that's typically best at
testing/exercising a hard drive (particularly also with lots of
random seeks included - I've had drives that read (and wrote)
perfectly fine end-to end, but failed miserably under random seek
conditions) ... but "most of the time" (at least more often than
not), reading end-to-end (even in purely sequential manner) will
typically pick up problems a drives is having. Also, due to all the
automagic remapping stuff, overwrite tests can quickly and
effectively "hide" a problem (or make it go away, when successfully
remaped), and can make it less clear that there at least *was* a
problem ... hence I generally recommend at least doing full reads,
before trying overwrites (at least if one wants to check/confirm if
the drive is or has been having a problem).

Also, not sure about the latest protocols, standards, and tools, but
as far as I'm aware, the "grown defects list" can be inspected (e.g.
via software and SCSI protocols) on SCSI disks, but I don't think
such capability exists for ATA/IDE drives (but perhaps that's
changed?). Precise answers on that may also vary depending on OS
flavor and available software. You mentioned Ubuntu (which is Debian
based). Debian has tools for getting detailed information from SCSI
devices - including the "grown defects" list, so I'd think it
probable Ubuntu includes or makes available same, or similar tool.
I'm not as sure about ATA/IDE, but perhaps you or someone else will
provide us with more information (and any applicable corrections)
regarding such.

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com