Unix Shell - POSIX Token recognition

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > May 2004 > POSIX Token recognition





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author POSIX Token recognition
Ellixis

2004-05-19, 5:41 pm

Hello,

I have a little problem about the interpretation of the fourth rule of
the "Token Recognition" part of the POSIX 1003.2 norm.

I quote:
##########
If the current character is backslash, single-quote, or double-quote (
'\' , '" , or ' )' and it is not quoted, it shall affect quoting for
subsequent characters up to the end of the quoted text. The rules for
quoting are as described in Quoting . During token recognition no
substitutions shall be actually performed, and the result token shall
contain exactly the characters that appear in the input (except for
<newline> joining), unmodified, including any embedded or enclosing
quotes or substitution operators, between the quote mark and the end
of the quoted text. The token shall not be delimited by the end of the
quoted field.
##########

If the current character is a backslash, where or what is the end of
the quoted text/field ?

Thanks,

--
Huang Julien
joe@invalid.address

2004-05-19, 5:41 pm

helix@phpbb-fr.com (Ellixis) writes:

> I have a little problem about the interpretation of the fourth rule of
> the "Token Recognition" part of the POSIX 1003.2 norm.
>
> I quote:
> ##########
> If the current character is backslash, single-quote, or double-quote (
> '' , '" , or ' )' and it is not quoted, it shall affect quoting for
> subsequent characters up to the end of the quoted text. The rules for
> quoting are as described in Quoting . During token recognition no
> substitutions shall be actually performed, and the result token shall
> contain exactly the characters that appear in the input (except for
> <newline> joining), unmodified, including any embedded or enclosing
> quotes or substitution operators, between the quote mark and the end
> of the quoted text. The token shall not be delimited by the end of the
> quoted field.
> ##########
>
> If the current character is a backslash, where or what is the end of
> the quoted text/field ?


As it says in the text you give here, it's explained in the "Quoting"
section:

[...]

2.2.1 Escape Character (Backslash)

A backslash that is not quoted shall preserve the literal value of
the following character, with the exception of a <newline>. If a
<newline> follows the backslash, the shell shall interpret this as
line continuation. The backslash and <newline>s shall be removed
before splitting the input into tokens. Since the escaped <newline>
is removed entirely from the input and is not replaced by any white
space, it cannot serve as a token separator.

[...]

http://www.opengroup.org/onlinepubs....html#tag_02_02

Joe
--
"Surprise me"
- Yogi Berra when asked where he wanted to be buried.
Ellixis

2004-05-20, 5:37 pm

joe@invalid.address wrote in message news:<m3isesruui.fsf@invalid.address>...

> As it says in the text you give here, it's explained in the "Quoting"
> section:


Thanks, I have badly interpreted the section.

I have another question about the operators: during token recognition,
only rule 2 and 6 are applied. In the Shell Grammar, these are the
operators:

######################
%token AND_IF OR_IF DSEMI
/* '&&' '||' ';;' */
%token DLESS DGREAT LESSAND GREATAND LESSGREAT DLESSDASH
/* '<<' '>>' '<&' '>&' '<>' '<<-' */
%token CLOBBER
/* '>|' */
######################

But what shall I do if I encounter these operators: "<", ">" and "|".

Examples:

$ ls > foo
--> 3 tokens

$ ls>foo
--> 1 or 3 tokens ? According to the POSIX 1003.2, it should be only
1. I think that this behavior isn't logic... or I'm not logic if you
think that it is logic ;-)

Thanks,
joe@invalid.address

2004-05-20, 5:37 pm

helix@phpbb-fr.com (Ellixis) writes:

> I have another question about the operators: during token
> recognition, only rule 2 and 6 are applied. In the Shell Grammar,
> these are the operators:
>
> ######################
> %token AND_IF OR_IF DSEMI
> /* '&&' '||' ';;' */
> %token DLESS DGREAT LESSAND GREATAND LESSGREAT DLESSDASH
> /* '<<' '>>' '<&' '>&' '<>' '<<-' */
> %token CLOBBER
> /* '>|' */
> ######################
>
> But what shall I do if I encounter these operators: "<", ">" and "|".
>
> Examples:
>
> $ ls > foo
> --> 3 tokens
>
> $ ls>foo
> --> 1 or 3 tokens ? According to the POSIX 1003.2, it should be only
> 1.


Why do you think that? I don't believe whitespace is significant in
the grammar for these tokens. Also, the grammar for the shell is
defined in 1003.1 unless I'm really missing something.

Perhaps Geoff could comment if he's watching.

Joe
--
"Surprise me"
- Yogi Berra when asked where he wanted to be buried.
Geoff Clare

2004-05-22, 10:28 pm

"joe" <joe@invalid.address> wrote, on Thu, 20 May 2004 21:47:32 +0100:

> helix@phpbb-fr.com (Ellixis) writes:
>
>
> Why do you think that? I don't believe whitespace is significant in
> the grammar for these tokens.


I think I see what Ellixis is getting at. The formal grammar has
a comment before the part that he quoted, saying "The following are
the operators mentioned above". Since '<' and '>' are not listed
there, the reader can be misled into thinking that '<' and '>' don't
count as "operators" in the token recognition rules. Of course the
reason they are not in that section of the grammar is because only the
tokens consisting of two or more characters need to be listed there,
not because they are not "operators". However, it is clear from the
description of redirection that '<' and '>' are indeed "operators".

> Also, the grammar for the shell is
> defined in 1003.1 unless I'm really missing something.


It is now, yes. It was defined in 1003.2-1992 up until 1003.1-2001 was
published.

--
Geoff Clare <nospam@gclare.org.uk>
Ellixis

2004-05-22, 10:28 pm

joe@invalid.address wrote in message news:<m3vfiqrfks.fsf@invalid.address>...

> Why do you think that? I don't believe whitespace is significant in
> the grammar for these tokens. Also, the grammar for the shell is
> defined in 1003.1 unless I'm really missing something.


In the rule 8:
"If the current character is an unquoted <blank>, any token containing
the previous character is delimited and the current character shall be
discarded."

Blank characters are considered as delimiters for the tokens. So, for
the above examples, are there 1 or 3 tokens ? :-)

It would be more logic if "<", ">" and "|" were considered as
operators...

PS: I was talking about the 1003.1, not the 1003.2 (an error of mine).
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com