08-23-07 12:21 PM
"Florian 'flub' Bruynooghe" <floris.bruynoo...@gmail.com> wrote:
> sed -e 's/^some_string/some_other_string/'
> or:
> sed -e '/^some_/s/^some_string/some_other_string/"
> ...
In this particular example the answer is determinate: the first
example will
always be faster (and more efficient) than the second. Since the RE (a
string)
is a proper substring of the RE of the second, there can never be any
reason
to prefer the second version to the first.
The first example can be thought of being transformed into the
construct:
/^some_string/s//some_other_string/
where the empty RE of the substitution is the last parsed RE. If a
non-empty "s" RE is specified (even if identical) it is just an
additional
construct to parse and execute. The only reason one would use this
form
is if one desires to pre-filter the input buffer before applying a
distinctively _different_ matching RE to it.
BTW, sed(1) is (well, I'll say it) _awesomely_ fast considering
the complexity of REs -- and the nature of sed(1)'s implementation
of conventional NFA's having an innate susceptibility to some
problematic
corner cases. Greg Ubben wrote a faithful dc(1) emulator (an arbitrary
precision RPN calculator) written in distribution sed(1) which
outperforms
early releases of GNU sed(1). Amazing!
http://sed.sf.net/local/scripts/dc.sed
Like n/awk(1), the time spent in the pre-execution overhead of parsing
and construction of the NFAs can be nontrivial; the execution of the
actual lines of sed(1) can be neglible; therefore, for relatively
small
datasets (<<1Mb?) the difference of total time may be trivial. I see
this
behavior from past sed(1) scripts that I have written which are many
thousands of lines of code.
=Brian, author of the first sed(1) debugger ever written, that
_nobody_
has ever indicated been used: [url]http://sed.sourceforge.net/local/debug/sd.sh.txt[/ur
l]
[ Post a follow-up to this message ]
|