Unix administration - What was the cause of "link_down" events to HDS array from Solaris server

This is Interesting: Free IT Magazines  
Home > Archive > Unix administration > September 2007 > What was the cause of "link_down" events to HDS array from Solaris server





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author What was the cause of "link_down" events to HDS array from Solaris server
underh20.scubadiving@gmail.com

2007-09-05, 7:31 pm

Our Sun server is running Solaris 8 with Veritas Foundation Suite
3.5. Some of our Veritas file systems received I/O errors which
caused the Veritas volumes to "DISABLE" state due to the link-down
events from our server to the storage SAN. These two Emluex LP9K HBAs
(lpfc2/lpfc6) served as the primary and secondary paths to the luns at
the SAN which the file systems were built upon. We were able to
resolve the I/O issues at our file systems by rebooting the server.

Is there way to tell what caused the link_down event and who's
responsible for the issue, e.g., the storage array, the path between
server and switch, the fabric, and/or the server end ? Thanks, Bill


df: cannot statvfs /a31: I/O error
df: cannot statvfs /b32: I/O error
df: cannot statvfs /c33: I/O error

Sep 1 17:38:31 minou lpfc: [ID 296855 kern.info] NOTICE:
lpfc6:031:Link Down Event received Data: 24 24 0 20
Sep 1 17:38:31 minou lpfc: [ID 934692 kern.info] NOTICE:
lpfc2:031:Link Down Event received Data: 2 20 20
Sep 1 17:39:32 minou scsi: [ID 107833 kern.warning] WARNING: /
ssm@0,0/
pci@18,600000/lpfc@1/sd@4b,14 (sd9336):
Sep 1 17:39:32 minou SCSI transport failed: reason 'tran_err':
retrying command
Sep 1 17:39:32 mionu scsi: [ID 107833 kern.warning] WARNING: /
ssm@0,0/
pci@18,600000/lpfc@1/sd@4b,17 (sd9339):
Sep 1 17:39:32 minou SCSI transport failed: reason 'tran_err':
retrying command
Sep 1 17:39:32 minou scsi: [ID 107833 kern.warning] WARNING: /
ssm@0,0/
pci@18,600000/lpfc@1/sd@4b,18 (sd9340):

Doug Freyburger

2007-09-06, 1:32 am

underh20.scubadiv...@gmail.com wrote:
>
> Our Sun server is running Solaris 8 with Veritas Foundation Suite
> 3.5. Some of our Veritas file systems received I/O errors which
> caused the Veritas volumes to "DISABLE" state due to the link-down
> events from our server to the storage SAN. These two Emluex LP9K HBAs
> (lpfc2/lpfc6) served as the primary and secondary paths to the luns at
> the SAN which the file systems were built upon. We were able to
> resolve the I/O issues at our file systems by rebooting the server.
>
> Is there way to tell what caused the link_down event and who's
> responsible for the issue, e.g., the storage array, the path between
> server and switch, the fabric, and/or the server end ? Thanks, Bill
> ...
> Sep 1 17:38:31 minou lpfc: [ID 296855 kern.info] NOTICE:
> lpfc6:031:Link Down Event received Data: 24 24 0 20
> Sep 1 17:38:31 minou lpfc: [ID 934692 kern.info] NOTICE:
> lpfc2:031:Link Down Event received Data: 2 20 20
> Sep 1 17:39:32 minou scsi: [ID 107833 kern.warning] WARNING: /
> ssm@0,0/
> pci@18,600000/lpfc@1/sd@4b,14 (sd9336):


When I see complaints of login/logout on the lpfc device I suspect
that the GBIC is failing and the problem is on the host HBA.

When I see Link Down I start to wonder about the switch end of
the link.

But there is one other thing to consider - transport timeouts can
be caused by some other traffic hogging the channel. How many
other ends are in your zones? The standard is for each pair of
ends to have its own zone and I've seen a site that put a dozen
hosts in a single zone see this error. Switch to a better zoning
standard, problem went away.

Also, are you mixing tape and disk traffic on the same HBA and
are the errors happening during backups? Tape transfers use
very large buffers; disk transfers have very tight timing requirements.

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com