What was the cause of "link_down" events to HDS array from Solaris server
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix administration > What was the cause of "link_down" events to HDS array from Solaris server




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    What was the cause of "link_down" events to HDS array from Solaris server  
underh20.scubadiving@gmail.com


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
09-06-07 12:31 AM

Our Sun server is running Solaris 8 with Veritas Foundation Suite
3.5.  Some of our Veritas file systems received I/O errors which
caused the Veritas volumes to "DISABLE" state due to the link-down
events from our server to the storage SAN.  These two Emluex LP9K HBAs
(lpfc2/lpfc6) served as the primary and secondary paths to the luns at
the SAN which the file systems were built upon.  We were able to
resolve the I/O issues at our file systems by rebooting the server.

Is there way to tell what caused the link_down event and who's
responsible for the issue, e.g., the storage array, the path between
server and switch, the fabric, and/or the server end ?   Thanks, Bill


df: cannot statvfs /a31: I/O error
df: cannot statvfs /b32: I/O error
df: cannot statvfs /c33: I/O error

Sep  1 17:38:31 minou lpfc: [ID 296855 kern.info] NOTICE:
lpfc6:031:Link Down Event received  Data: 24 24 0 20
Sep  1 17:38:31 minou lpfc: [ID 934692 kern.info] NOTICE:
lpfc2:031:Link Down Event received  Data: 2 20 20
Sep  1 17:39:32 minou scsi: [ID 107833 kern.warning] WARNING: /
ssm@0,0/
pci@18,600000/lpfc@1/sd@4b,14 (sd9336):
Sep  1 17:39:32 minou  SCSI transport failed: reason 'tran_err':
retrying command
Sep  1 17:39:32 mionu scsi: [ID 107833 kern.warning] WARNING: /
ssm@0,0/
pci@18,600000/lpfc@1/sd@4b,17 (sd9339):
Sep  1 17:39:32 minou SCSI transport failed: reason 'tran_err':
retrying command
Sep  1 17:39:32 minou scsi: [ID 107833 kern.warning] WARNING: /
ssm@0,0/
pci@18,600000/lpfc@1/sd@4b,18 (sd9340):






[ Post a follow-up to this message ]



    Re: What was the cause of "link_down" events to HDS array from Solaris serve  
Doug Freyburger


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
09-06-07 06:32 AM

underh20.scubadiv...@gmail.com wrote:
>
> Our Sun server is running Solaris 8 with Veritas Foundation Suite
> 3.5.  Some of our Veritas file systems received I/O errors which
> caused the Veritas volumes to "DISABLE" state due to the link-down
> events from our server to the storage SAN.  These two Emluex LP9K HBAs
> (lpfc2/lpfc6) served as the primary and secondary paths to the luns at
> the SAN which the file systems were built upon.  We were able to
> resolve the I/O issues at our file systems by rebooting the server.
>
> Is there way to tell what caused the link_down event and who's
> responsible for the issue, e.g., the storage array, the path between
> server and switch, the fabric, and/or the server end ?   Thanks, Bill
> ...
> Sep  1 17:38:31 minou lpfc: [ID 296855 kern.info] NOTICE:
> lpfc6:031:Link Down Event received  Data: 24 24 0 20
> Sep  1 17:38:31 minou lpfc: [ID 934692 kern.info] NOTICE:
> lpfc2:031:Link Down Event received  Data: 2 20 20
> Sep  1 17:39:32 minou scsi: [ID 107833 kern.warning] WARNING: /
> ssm@0,0/
> pci@18,600000/lpfc@1/sd@4b,14 (sd9336):

When I see complaints of login/logout on the lpfc device I suspect
that the GBIC is failing and the problem is on the host HBA.

When I see Link Down I start to wonder about the switch end of
the link.

But there is one other thing to consider - transport timeouts can
be caused by some other traffic hogging the channel.  How many
other ends are in your zones?  The standard is for each pair of
ends to have its own zone and I've seen a site that put a dozen
hosts in a single zone see this error.  Switch to a better zoning
standard, problem went away.

Also, are you mixing tape and disk traffic on the same HBA and
are the errors happening during backups?  Tape transfers use
very large buffers; disk transfers have very tight timing requirements.






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 10:39 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register