| Icarus Sparry 2006-12-20, 1:39 am |
| On Tue, 19 Dec 2006 16:07:56 -0800, Spiros Bousbouras wrote:
> Assume we have a regexp of the form 'E1|E2' where E1 , E2 are also
> regexps and we attempt to match it against a string where both E1
> and E2 match. Does the POSIX standard (or some man page) determine
> which of E1 , E2 is considered the match ? This could be important
> in case the whole of E1|E2 is inside parentheses and we refer to it
> later.
>
> I'm interested in the answer in the general case where we have regexps
> E1 , E2 , ... , En and we form a regexp by taking their disjunction ie
> E1|E2|...|En
The usual matching is first "leftmost", then "longest" of successful
matches.
Using GNU sed, which has patterns of this form, one sees
I="I am the friend of fred and joe today"
echo "$I" | sed -r 's/joe|fred/bill/'
outputting
I am the friend of bill and joe today
So here joe matched leftmost (earliest) in the input.
echo "$I" | sed -r 's/of f[der]*|of fr[^t]*/peter /'
outputs
I am the friend of peter today
Here both patterns match at the same place, so the longer one, matching
"of fred and joe " rather than "of fred" wins.
This may be spelled out in your online manual for "regexp". If not the
O'Reilly book "Mastering Regular Expressions" is well worth reading.
|