USB mass storage error recovery
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > WebserverTalk Community > Data Storage > USB mass storage error recovery




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    USB mass storage error recovery  
darin@usa.net


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-25-06 12:13 AM

I'm getting stuck trying to figure out how to recover from a
USB mass storage "error".  This is for "bulk only" protocol.
The root problem is that the RTOS we use is returning errors
when they don't really exist and terminating transfers in
the middle.  I think I can fix that problem, but in the
meantime I was trying to add some error checking and
recovery to their mass storage driver.  I'd also like to be
able to recover from real errors if they ever happen.

What happens is that the CBW command bytes are sent
successfully, then the data phase is interrupted mid-stream.
When the host ignores the error and tries to read the CSW
status it hangs forever.

My first approach was to detect the error and return from
the transfer routine without reading the CSW.  But the very
next I/O operation will fail.  I then tried doing doing a
"bulk-only mass storage reset" operation, but that I/O also
hangs.  I then tried first clearing stalled endpoints out of
desparation, and then doing the reset, but that didn't help.

So now I'm baffled.  What I think is happening is that the
mass storage devices are still waiting to read bytes from
the data phase, and can not leave that state even if they
see bytes on the control endpoint.

Is there anything I can do to clear this stuck state?

--
Darin Johnson






[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Noway2


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-25-06 12:28 PM


darin@usa.net wrote:
> I'm getting stuck trying to figure out how to recover from a
> USB mass storage "error".  This is for "bulk only" protocol.
> The root problem is that the RTOS we use is returning errors
> when they don't really exist and terminating transfers in
> the middle.  I think I can fix that problem, but in the
> meantime I was trying to add some error checking and
> recovery to their mass storage driver.  I'd also like to be
> able to recover from real errors if they ever happen.
>
> What happens is that the CBW command bytes are sent
> successfully, then the data phase is interrupted mid-stream.
> When the host ignores the error and tries to read the CSW
> status it hangs forever.
>
> My first approach was to detect the error and return from
> the transfer routine without reading the CSW.  But the very
> next I/O operation will fail.  I then tried doing doing a
> "bulk-only mass storage reset" operation, but that I/O also
> hangs.  I then tried first clearing stalled endpoints out of
> desparation, and then doing the reset, but that didn't help.
>
> So now I'm baffled.  What I think is happening is that the
> mass storage devices are still waiting to read bytes from
> the data phase, and can not leave that state even if they
> see bytes on the control endpoint.
>
> Is there anything I can do to clear this stuck state?
>
> --
> Darin Johnson

We (myself and another engineer) have encountered this on our project
too.  The other engineer is the one who is handling the software and
would be the one with the suggestions for you.  I have forwarded your
post on to him and asked him if he has any suggestsions and asked him
to either post it here, or reply to my inquiry and I will post it here.






[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Noway2


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-26-06 06:13 AM


darin@usa.net wrote:
> I'm getting stuck trying to figure out how to recover from a
> USB mass storage "error".  This is for "bulk only" protocol.
> The root problem is that the RTOS we use is returning errors
> when they don't really exist and terminating transfers in
> the middle.  I think I can fix that problem, but in the
> meantime I was trying to add some error checking and
> recovery to their mass storage driver.  I'd also like to be
> able to recover from real errors if they ever happen.
>
> What happens is that the CBW command bytes are sent
> successfully, then the data phase is interrupted mid-stream.
> When the host ignores the error and tries to read the CSW
> status it hangs forever.
>
> My first approach was to detect the error and return from
> the transfer routine without reading the CSW.  But the very
> next I/O operation will fail.  I then tried doing doing a
> "bulk-only mass storage reset" operation, but that I/O also
> hangs.  I then tried first clearing stalled endpoints out of
> desparation, and then doing the reset, but that didn't help.
>
> So now I'm baffled.  What I think is happening is that the
> mass storage devices are still waiting to read bytes from
> the data phase, and can not leave that state even if they
> see bytes on the control endpoint.
>
> Is there anything I can do to clear this stuck state?
>
> --
> Darin Johnson

Here is the procedure for clearing faults on a bulk storage transfer
that we are using in our project.  Hope this helps.

The following are the steps I take during a bulk transport:

1. Send the CSW. If the pipe stalls, clear the stall and go to the
transport stage.
If the clear stall fails, or the original result was some other
error, perform
a Bulk Reset and exit the transport routine.

2. Send/receive data. If the pipe stalls, clear it and go to the read
CSW stage.
If the clear stall fails, or the original result was some other
error, perform
a Bulk Reset and exit the transport routine. Do not try to read
the CSW.

3. Read the CSW. If the pipe stalls, clear it and try to reread the
CSW.
If the clear stall fails, some other error occurs, or the CSW
is
invalid,
perform a Bulk Reset.

On all of the steps above, if the Bulk Reset fails, the HC or device is
not working properly. If other devices ARE working properly (or if it
can be verified that the HC is functioning properly), assume the device
is corrupt and ignore it. Otherwise try a hardware reset.






[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Darin Johnson


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-26-06 06:13 AM

Noway2 wrote:
> 2. Send/receive data. If the pipe stalls, clear it and go to the
>         read CSW stage.  If the clear stall fails, or the original
>         result was some other error, perform a Bulk Reset and exit
>         the transport routine. Do not try to read the CSW.

OK, I get the "other error" here.  It's not a "real" error since
the RTOS vendor supplied USB software is broken, but it's probably
a good simulation of a real error.  The problem is that the Bulk
Reset hangs also (at the status stage I think).  The vendor supplied
mass storage software doesn't implement any timeouts to detect
a hang...

There were basically two bugs in the software - the host controller
driver cancelling transfers too soon, and the mass storage driver
not handling errors.  I tried to solve the latter first, but I made
more headway after fixing the host controller instead.  Though if
there ever is a real error...

--
Darin Johnson






[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Steve Calfee


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-29-06 12:12 AM

On 25 Jul 2006 21:03:48 -0700, "Darin Johnson" <darin@usa.net> wrote:

>Noway2 wrote: 
>
>OK, I get the "other error" here.  It's not a "real" error since
>the RTOS vendor supplied USB software is broken, but it's probably
>a good simulation of a real error.  The problem is that the Bulk
>Reset hangs also (at the status stage I think).  The vendor supplied
>mass storage software doesn't implement any timeouts to detect
>a hang...
>
>There were basically two bugs in the software - the host controller
>driver cancelling transfers too soon, and the mass storage driver
>not handling errors.  I tried to solve the latter first, but I made
>more headway after fixing the host controller instead.  Though if
>there ever is a real error...

Hi,

How about some more info? What OS are you using? Are you having
problems with a host or a device?

I am having similar problems with a device "function" running on
nucleus. It sends quite a bit of data both directions. It always hangs
when I do a "format" from Winxp home. It would seem to hang near the
end of the format. I do not know if this is a Nucleus driver stack
problem or if I have a hardware driver issue (which is what I am
debugging).

I am not seeing any stalls on the bus. What error is the mass storage
driver not handling? Again is this a host or device issue?

Regards, Steve

There is no "x" in my email address.





[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Darin Johnson


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-29-06 06:13 AM

Steve Calfee wrote:
> How about some more info? What OS are you using? Are you having
> problems with a host or a device?

It's Nucleus, with a "host" driver, using EHCI.  The error
is not a real error, but it would be a transaction error
(bad PID, CRC, etc).  The OS assumes that if this bit is set
that there's an error, although the HW retries the
transaction up to 3 times in this case.  From what I can
see, *any* error during the data phase, other than a STALL,
would cause problems.  For instance, if the endpoint halted
due to due many transaction or buffer errors.

--
Darin Johnson






[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Steve at fivetrees


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-29-06 06:12 PM

"Darin Johnson" <darin@usa.net> wrote in message
news:1154132773.339143.148360@m79g2000cwm.googlegroups.com...
> Steve Calfee wrote: 
<snip>[vbcol=seagreen]
> I am having similar problems with a device "function" running on
> nucleus.
>
> It's Nucleus, with a "host" driver, using EHCI.

Hmmm. Nucleus sounds a bit pants, then.

I *hate* not being able to trust 3rd-party code.

Steve
http://www.fivetrees.com







[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Darin Johnson


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-30-06 12:12 AM

> Hmmm. Nucleus sounds a bit pants, then.

It's not all bad, it just comes with lots of parts.  Some of the parts
are very reliable and stable and do what you want well, while others
are relatively new.  An advantage is that you get all the source code,
a disadvantage is that you sometimes need the source code...

--
Darin Johnson






[ Post a follow-up to this message ]



    Re: USB mass storage error recovery  
Steve at fivetrees


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
07-30-06 06:14 AM

"Darin Johnson" <darin@usa.net> wrote in message
news:1154210029.314669.286780@s13g2000cwa.googlegroups.com... 
>
> It's not all bad, it just comes with lots of parts.  Some of the parts
> are very reliable and stable and do what you want well, while others
> are relatively new.  An advantage is that you get all the source code,
> a disadvantage is that you sometimes need the source code...

Understood. But - if it were me, I'd put all sorts of compiler warnings over
the untested new bits, or over provisional code. I mean, no timeouts...
that's pretty bad. I'd hate to have to find that out the hard way.

Steve
http://www.fivetrees.com







[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 09:44 PM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register