sed performance
Web Server forum
Back To The Forum Home!Search!Private Messaging System

Web Server Talk Web Server Talk > Unix and Linux reviews > Free Unix support > Unix Shell > sed performance




  Last Thread   Next Thread Next
  Show Printable Version Email this Page Subscribe to this Thread      Post New Thread    Post A Reply      

    sed performance  
flub


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-21-07 12:22 PM

Hi

Suppose you need to do something like:

sed -e 's/^some_string/some_other_string/'

Where say 1 in 3 lines in the file match the pattern.  Is it worth to
do this?

sed -e '/^some_/s/^some_string/some_other_string/"

You can suppose that none of the lines that are not affected will
match "^some_" here.  But I just can't decide which to use, the first
seems slightly cleaner and easier to read while the other is a little
more specific.  Will there be any performance gain using the second?
Any other arguments for one of the forms?

Cheers
Floris






[ Post a follow-up to this message ]



    Re: sed performance  
Stachu 'Dozzie' K.


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-21-07 12:22 PM

On 21.08.2007, flub <floris.bruynooghe@gmail.com> wrote:
> Suppose you need to do something like:
>
> sed -e 's/^some_string/some_other_string/'
>
> Where say 1 in 3 lines in the file match the pattern.  Is it worth to
> do this?
>
> sed -e '/^some_/s/^some_string/some_other_string/"
>
> You can suppose that none of the lines that are not affected will
> match "^some_" here.  But I just can't decide which to use, the first
> seems slightly cleaner and easier to read while the other is a little
> more specific.  Will there be any performance gain using the second?

How big is the file you want to operate on? If it's 3 lines long, then
you won't notice any performance gain. It's even possible that the
second case will be slower (by few CPU cycles) since the /^some_/ will
have to be converted to finite automata. I think the first is fast
enough, simpler and more convenient.

> Any other arguments for one of the forms?

--
Secunia non olet.
Stanislaw Klekot





[ Post a follow-up to this message ]



    Re: sed performance  
Floris Bruynooghe


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-21-07 12:22 PM

On Aug 21, 10:56 am, "Stachu 'Dozzie' K."
<doz...@dynamit.im.pwr.wroc.pl.nospam> wrote:
> On 21.08.2007, flub <floris.bruynoo...@gmail.com> wrote:
> 
> 
> 
> 
> 
>
> How big is the file you want to operate on? If it's 3 lines long, then
> you won't notice any performance gain. It's even possible that the
> second case will be slower (by few CPU cycles) since the /^some_/ will
> have to be converted to finite automata. I think the first is fast
> enough, simpler and more convenient.

Well, I was thinking of a few thousend lines.  Seems predictable that
the second will be slower for only a few lines.  I tend to go for the
first too, from a readability point of view.

Regards
Floris






[ Post a follow-up to this message ]



    Re: sed performance  
Stachu 'Dozzie' K.


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-21-07 12:22 PM

On 21.08.2007, Floris Bruynooghe <floris.bruynooghe@gmail.com> wrote:
> On Aug 21, 10:56 am, "Stachu 'Dozzie' K."
><doz...@dynamit.im.pwr.wroc.pl.nospam> wrote: 
>
> Well, I was thinking of a few thousend lines.

That is, file with ~4kB size. No performance gain at all. It will be
swallowed by the time of fork()+exec() system calls (calling external
utilities is quite expensive).

> Seems predictable that
> the second will be slower for only a few lines.  I tend to go for the
> first too, from a readability point of view.

--
Secunia non olet.
Stanislaw Klekot





[ Post a follow-up to this message ]



    Re: sed performance  
Barry Margolin


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-22-07 06:22 AM

In article <1187689476.154544.149790@k79g2000hse.googlegroups.com>,
flub <floris.bruynooghe@gmail.com> wrote:

> Hi
>
> Suppose you need to do something like:
>
> sed -e 's/^some_string/some_other_string/'
>
> Where say 1 in 3 lines in the file match the pattern.  Is it worth to
> do this?
>
> sed -e '/^some_/s/^some_string/some_other_string/"
>
> You can suppose that none of the lines that are not affected will
> match "^some_" here.  But I just can't decide which to use, the first
> seems slightly cleaner and easier to read while the other is a little
> more specific.  Will there be any performance gain using the second?
> Any other arguments for one of the forms?

Why guess -- time the two methods and see if there's any difference.

--
Barry Margolin, barmar@alum.mit.edu
Arlington, MA
*** PLEASE post questions in newsgroups, not directly to me ***
*** PLEASE don't copy me on replies, I'll read them in the group ***





[ Post a follow-up to this message ]



    Re: sed performance  
bsh


View Ip Address Report This Message To A Moderator Edit/Delete Message


 
08-23-07 12:21 PM

"Florian 'flub' Bruynooghe" <floris.bruynoo...@gmail.com> wrote:
> sed -e 's/^some_string/some_other_string/'
> or:
> sed -e '/^some_/s/^some_string/some_other_string/"
> ...

In this particular example the answer is determinate: the first
example will
always be faster (and more efficient) than the second. Since the RE (a
string)
is a proper substring of the RE of the second, there can never be any
reason
to prefer the second version to the first.

The first example can be thought of being transformed into the
construct:

/^some_string/s//some_other_string/

where the empty RE of the substitution is the last parsed RE. If a
non-empty "s" RE is specified (even if identical) it is just an
additional
construct to parse and execute. The only reason one would use this
form
is if one desires to pre-filter the input buffer before applying a
distinctively _different_ matching RE to it.

BTW, sed(1) is (well, I'll say it) _awesomely_ fast considering
the complexity of REs -- and the nature of sed(1)'s implementation
of conventional NFA's having an innate susceptibility to some
problematic
corner cases. Greg Ubben wrote a faithful dc(1) emulator (an arbitrary
precision RPN calculator) written in distribution sed(1) which
outperforms
early releases of GNU sed(1). Amazing!

http://sed.sf.net/local/scripts/dc.sed

Like n/awk(1), the time spent in the pre-execution overhead of parsing
and construction of the NFAs can be nontrivial; the execution of the
actual lines of sed(1) can be neglible; therefore, for relatively
small
datasets (<<1Mb?) the difference of total time may be trivial. I see
this
behavior from past sed(1) scripts that I have written which are many
thousands of lines of code.

=Brian, author of the first sed(1) debugger ever written, that
_nobody_
has ever indicated been used: [url]http://sed.sourceforge.net/local/debug/sd.sh.txt[/ur
l]






[ Post a follow-up to this message ]



    Sponsored Links  




 





   All times are GMT. The time now is 04:12 AM.      Post New Thread    Post A Reply      
  Last Thread   Next Thread Next


Most Popular forums 

Forum Jump:
Rate This Thread:

Forum Rules:
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is OFF
vB code is ON
Smilies are ON
[IMG] code is OFF
 
Medical and Health forum | Computer Games Reviews | Graphics design forum

Back To The Top
Home | Usercp | Faq | Register