| Mike Looijmans 2005-11-08, 5:56 pm |
| Here's one that passes all the tests, and is 2x as fast as the 'current'
and 'new' implementations on random binary data. I haven't been able to
generate data where the 'mike' version is slower:
def read_to_boundary(self, req, boundary, file, readBlockSize=65536):
prevline = ""
last_bound = boundary + '--'
carry = None
while 1:
line = req.readline(readBlockSize)
if not line or line.startswith(boundary):
if prevline.endswith('\r\n'):
if carry is not None:
file.write(carry)
file.write(prevline[:-2])
break
elif (carry == '\r') and (prevline[-1] == '\n'):
file.write(prevline[:-1])
break
# If we get here, it's not really a boundary!
if carry is not None:
file.write(carry)
carry = None
if prevline[-1:] == '\r':
file.write(prevline[:-1])
carry = '\r'
else:
file.write(prevline)
prevline = line
I've attached a modified upload_test_harness.py that includes the new
and current, also the 'org' version (as in 3.1 release) and the 'mike'
version.
In addition, I added some profiling calls to show the impact of the
extra 'endswith' and slices.
--
Mike Looijmans
Philips Natlab / Topic Automation
|