 |
|
 |
|
03-10-07 06:12 PM
I work for a consulting firm, and have begun to do troubleshooting on
small SANs, mostly HP MSA1500cs based.
Many times the problem the customer is talking about is some vague
intermittent slowness issue or something like that. In cases like
this, my troubleshooting goes something like this:
1. Check switch logs for marginal ports or other errors (usually
brocade 4/24s or similar)
2. Update to latest firmware and driver levels on HBAs, Switch, MSA,
etc.
If the problem still exists, I'll call HP support, but more often than
not they can't really help from here. So the only approach that
yields results is to start unplugging stuff until I see the problem
disappear.
In one recent instance, I had a customer start shutting blades off
until he found that one of them had an HBA that was mysteriously
causing the intermittent slowness for the whole SAN. The HBA actually
seemed to work, and there were no errors in the Windows event logs, or
switch logs, sansurfer, or anything.
There has got to be a better way to find this kind of thing. On an IP
network, I would run Ethereal or some other packet analyzer to try and
see what is talking on the network when the problem manifests. But
I've never really found anything like that for a fibre channel SAN.
As I said, I'm pretty new to SAN, so any direction would be helpful.
Thanks,
Sean
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
03-12-07 06:14 PM
Uzytkownik <seanh012@gmail.com> napisal w wiadomosci
news:1173550832.940149.73810@j27g2000cwj.googlegroups.com...
>I work for a consulting firm, and have begun to do troubleshooting on
> small SANs, mostly HP MSA1500cs based.
>
> Many times the problem the customer is talking about is some vague
> intermittent slowness issue or something like that. In cases like
> this, my troubleshooting goes something like this:
>
> 1. Check switch logs for marginal ports or other errors (usually
> brocade 4/24s or similar)
> 2. Update to latest firmware and driver levels on HBAs, Switch, MSA,
> etc.
>
> If the problem still exists, I'll call HP support, but more often than
> not they can't really help from here. So the only approach that
> yields results is to start unplugging stuff until I see the problem
> disappear.
>
> In one recent instance, I had a customer start shutting blades off
> until he found that one of them had an HBA that was mysteriously
> causing the intermittent slowness for the whole SAN. The HBA actually
> seemed to work, and there were no errors in the Windows event logs, or
> switch logs, sansurfer, or anything.
>
> There has got to be a better way to find this kind of thing. On an IP
> network, I would run Ethereal or some other packet analyzer to try and
> see what is talking on the network when the problem manifests. But
> I've never really found anything like that for a fibre channel SAN.
>
> As I said, I'm pretty new to SAN, so any direction would be helpful.
>
> Thanks,
> Sean
>
Hi Sean,
check
[url]http://www.finisar.com/index.php?file=product&var=product&div_id=smenu3&level=B&su
b_cat_id=3& dlink=SAN%20Monitoring%20and%20Analysis[
/url]
Good luck,
Piotr
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
03-13-07 06:16 AM
> Hi Sean,
>
> check
> [url]http://www.finisar.com/index.php?file=product&var=product&div_id=smenu3&level=B&
sub_cat_id=3& dlink=SAN%20Monitoring%20and%20Analysis[
/url]
>
> Good luck,
> Piotr
Yeah I found some of that stuff. The problem with everything I've found is
that it requires Taps. I haven't found anything equivalent to a "mirroring
port" on a switch.
Does such a thing exist?
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
03-13-07 12:13 PM
Uzytkownik "Sean Howard" <seanh012@gmail.com> napisal w wiadomosci
news:vZ6dnQ0LoffOaWjYnZ2dnUVZ_oCmnZ2d@co
mcast.com...
>
> Yeah I found some of that stuff. The problem with everything I've found
> is that it requires Taps. I haven't found anything equivalent to a
> "mirroring port" on a switch.
>
> Does such a thing exist?
Yes it does, but not on every product. As far as I am aware you can find it
on Brocade 48000 directors and Brocade 5000 FC switches.
There is a good reason for using Taps in SAN monitoring and troubleshooting
(see below as found in a Finsar document covering this problem).
1. Multiple ports mirrored to one port causes buffer overflow and dropped
packets.
2. Packets go through a buffer and are retimed, making accurate time
sensitive measurements impossible, such as jitter, packet gap analysis, or
latency.
3. Most mirror ports filter anomalies, thus making troubleshooting
impossible.
4. Turning on port mirroring puts a load on the switch's CPU/transfer logic
thus impacting the switch's operational performance.
Piotr
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
 |  |  |  |  |
 |
 |
|
mark.stubblefield@gmail.com |
|
|
 |
 |


 |
 |
 |
|  |  |  |  |
|
03-17-07 12:13 AM
Since when do 48k's (or *any* Brocade switch) support port mirroring?
One would think the Condor's could handle it, but I've never seen it
implemented in Brocade's product line. I'm not sure about Cisco.
To the OP, the only way I know of is tapping the fabric. There are FC
protocol analyzers, but they sit in band.
-Mark
On Mar 13, 7:50 am, "Piotr" <nos...@nospam.com> wrote:
> Uzytkownik "Sean Howard" <seanh...@gmail.com> napisal w wiadomoscinews:vZ6
dnQ0LoffOaWjYnZ2dnUVZ_oCmnZ2d@comcast.com...
>
>
>
>
>
>
> Yes it does, but not on every product. As far as I am aware you can find i
t
> on Brocade 48000 directors and Brocade 5000 FC switches.
>
> There is a good reason for using Taps in SAN monitoring and troubleshootin
g
> (see below as found in a Finsar document covering this problem).
> 1. Multiple ports mirrored to one port causes buffer overflow and dropped
> packets.
> 2. Packets go through a buffer and are retimed, making accurate time
> sensitive measurements impossible, such as jitter, packet gap analysis, or
> latency.
> 3. Most mirror ports filter anomalies, thus making troubleshooting
> impossible.
> 4. Turning on port mirroring puts a load on the switch's CPU/transfer logi
c
> thus impacting the switch's operational performance.
>
> Piotr
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
03-19-07 06:13 AM
<seanh012@gmail.com> wrote in message
news:1173550832.940149.73810@j27g2000cwj.googlegroups.com...
>I work for a consulting firm, and have begun to do troubleshooting on
> small SANs, mostly HP MSA1500cs based.
>
> Many times the problem the customer is talking about is some vague
> intermittent slowness issue or something like that. In cases like
> this, my troubleshooting goes something like this:
>
> 1. Check switch logs for marginal ports or other errors (usually
> brocade 4/24s or similar)
> 2. Update to latest firmware and driver levels on HBAs, Switch, MSA,
> etc.
>
> If the problem still exists, I'll call HP support, but more often than
> not they can't really help from here. So the only approach that
> yields results is to start unplugging stuff until I see the problem
> disappear.
>
> In one recent instance, I had a customer start shutting blades off
> until he found that one of them had an HBA that was mysteriously
> causing the intermittent slowness for the whole SAN. The HBA actually
> seemed to work, and there were no errors in the Windows event logs, or
> switch logs, sansurfer, or anything.
>
> There has got to be a better way to find this kind of thing. On an IP
> network, I would run Ethereal or some other packet analyzer to try and
> see what is talking on the network when the problem manifests. But
> I've never really found anything like that for a fibre channel SAN.
>
> As I said, I'm pretty new to SAN, so any direction would be helpful.
>
> Thanks,
> Sean
>
You're correct. There is no such thing as port mirroring or fibre channel
software analyzer such as Ethernet's Ethereal. Your best bet in this
scenario without using an inline fibre channel analyzer (Finisar is the
defacto standard) is to use an application such as SCSI Utility For Windows
to monitor the HBA port statistics to determine what errors man be
happening.
The Moojit
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
03-25-07 12:12 AM
Sean,
I'm going to guessing that this wasn't a FC problem. I'm more inclined to b
elieve it was a SCSI problem. Specifically
I would guess that the blade you closed down was doing Target Resets.
If an initiator sends a target reset to a target and this target is providin
g LUNs for multiple initiators, all the
outstanding IOs to all the initiators get reset. The initiators time out an
d retry the IO which succeeds. The end
result is all the initiators slow down but no errors are displayed. Zoning
won't help.
You can limit the possible suspects by seeing which initiators are slowing d
own and which target they have in common.
The HP box might provide some higher debug level that exposes target resets
so you can track them down.
From my experience, the most likely culprit is a Window 2003 SP1 cluster nod
e (probably with an older storport driver.)
I suggest whenever you see this problem just upgrade all the Windows cluster
s and all the storport drivers.
Follow http://support.microsoft.com/defaul...kb;EN-US;923830
MSCS use resets to decide quorum ownership and when they get in a pickle, th
e do too many resets. Too many resets show
up as slow storage. Cluster Nodes do log resets in the cluster log, althoug
h they don't call them resets, look for
/arbitrat/ as in arbitration or something like that.
There is also the Emulex TPRLO command which is an FC issue. You can resear
ch TPRLOs. If the offending blade had
Emulex cards see if TPRLO was enabled. (By default it shouldn't be and if i
t is you'll get the same problems).
seanh012@gmail.com wrote:
> I work for a consulting firm, and have begun to do troubleshooting on
> small SANs, mostly HP MSA1500cs based.
>
> Many times the problem the customer is talking about is some vague
> intermittent slowness issue or something like that. In cases like
> this, my troubleshooting goes something like this:
>
> 1. Check switch logs for marginal ports or other errors (usually
> brocade 4/24s or similar)
> 2. Update to latest firmware and driver levels on HBAs, Switch, MSA,
> etc.
>
> If the problem still exists, I'll call HP support, but more often than
> not they can't really help from here. So the only approach that
> yields results is to start unplugging stuff until I see the problem
> disappear.
>
> In one recent instance, I had a customer start shutting blades off
> until he found that one of them had an HBA that was mysteriously
> causing the intermittent slowness for the whole SAN. The HBA actually
> seemed to work, and there were no errors in the Windows event logs, or
> switch logs, sansurfer, or anything.
>
> There has got to be a better way to find this kind of thing. On an IP
> network, I would run Ethereal or some other packet analyzer to try and
> see what is talking on the network when the problem manifests. But
> I've never really found anything like that for a fibre channel SAN.
>
> As I said, I'm pretty new to SAN, so any direction would be helpful.
>
> Thanks,
> Sean
>
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
04-02-07 12:14 PM
On 25 Mar, 00:50, Bob S <bsremovemetrac...@nycap.rr.com> wrote:
> Sean,
>
> I'm going to guessing that this wasn't a FC problem. I'm more inclined to
believe it was a SCSI problem. Specifically
> I would guess that the blade you closed down was doing Target Resets.
>
> If an initiator sends a target reset to a target and this target is provid
ing LUNs for multiple initiators, all the
> outstanding IOs to all the initiators get reset. The initiators time out
and retry the IO which succeeds. The end
> result is all the initiators slow down but no errors are displayed. Zonin
g won't help.
>
> You can limit the possible suspects by seeing which initiators are slowing
down and which target they have in common.
> The HP box might provide some higher debug level that exposes target reset
s so you can track them down.
>
> From my experience, the most likely culprit is a Window 2003 SP1 cluster
node (probably with an older storport driver.)
> I suggest whenever you see this problem just upgrade all the Windows clu
sters and all the storport drivers.
>
> Followhttp://support.microsoft.com/default.aspx?scid=kb;EN-US;923830
>
> MSCS use resets to decide quorum ownership and when they get in a pickle,
the do too many resets. Too many resets show
> up as slow storage. Cluster Nodes do log resets in the cluster log, altho
ugh they don't call them resets, look for
> /arbitrat/ as in arbitration or something like that.
>
> There is also the Emulex TPRLO command which is an FC issue. You can rese
arch TPRLOs. If the offending blade had
> Emulex cards see if TPRLO was enabled. (By default it shouldn't be and if
it is you'll get the same problems).
>
>
>
> seanh...@gmail.com wrote:
>
>
>
>
>
>
>
>
> - Show quoted text -
I work as a SAN consultant for HP and I agree that embedding taps into
environments is a very good idea. I have three finisar analysers and
one of the biggest problems is getting the change approved to add or
remove them, getting the customer to install taps removes this
obstacle. The cisco platform does have the SD port (mirror...)
functionality but you don't see the whole picture when using it. Last
time I was involved with an escalation on MDS then cisco themselves
asked for a finisar trace.
Kind Regards
Jason
[ Post a follow-up to this message ]
|
|
|
 |
|
 |
|
 |
|
|
|
Sponsored Links |
 |
 |
|
|
 |
All times are GMT. The time now is 05:20 AM. |
 |
|
|
 |
|
 |
|
|
 |
|
Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
|
|
|
|
Medical and Health forum | Computer Games Reviews | Graphics design forum
|
 |
|
 |
|