Unix Programming - Directly Modify Terminal Input Buffer?

This is Interesting: Free IT Magazines  
Home > Archive > Unix Programming > August 2005 > Directly Modify Terminal Input Buffer?





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Directly Modify Terminal Input Buffer?
Michael B Allen

2005-08-15, 5:54 pm

Is there a way to directly write / read the terminal input buffer to
implement GNU readline / command history CLI type behavior?

I know I can intercept using a pty but is there a more direct method?

Thanks,
Mike

Michael B Allen

2005-08-15, 5:54 pm

On Mon, 15 Aug 2005 18:00:05 -0400, Michael B Allen wrote:

> Is there a way to directly write / read the terminal input buffer to
> implement GNU readline / command history CLI type behavior?


I just remembered how to do this. Basically I just turn off canonical
mode and reimplement it plus whatever functionality I want. Am I getting
warmer?

Thanks,
Mike

Pascal Bourguignon

2005-08-15, 8:48 pm

Michael B Allen <mba2000@ioplex.com> writes:

> On Mon, 15 Aug 2005 18:00:05 -0400, Michael B Allen wrote:
>
>
> I just remembered how to do this. Basically I just turn off canonical
> mode and reimplement it plus whatever functionality I want. Am I getting
> warmer?


Yes.
The RAW mode may even be better than ICANON.

man 1 stty
man 3 termios

--
__Pascal Bourguignon__ http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay
Michael B Allen

2005-08-16, 2:50 am

On Tue, 16 Aug 2005 02:02:02 +0200, Pascal Bourguignon wrote:
>
> Yes.
> The RAW mode may even be better than ICANON.
>
> man 1 stty
> man 3 termios


It looks like I have two options though.

A) Do a state machine that interprets each input byte and outputs 0
or more output bytes (e.g. read up arrow \033]A; -> output \007prompt>
cmd). A full-blown state machine is necessary for this because even
ignoring a sequence still requires knowing it's length.

B) Use the number of bytes returned by read as a hint to assist with
interpreting the input. For example if read returns 3 and the sequence
is of no interest then I can just ignore and read the next seq.

The second approach is easier but is it considered sound? Can I be
guaranteed that one and only one sequence will be returned with each
call to read?

Otherwise is there a method to deterministically find the length of an
arbitrary terminal input sequence?

Thanks,
Mike

Pascal Bourguignon

2005-08-16, 5:59 pm

Michael B Allen <mba2000@ioplex.com> writes:
>
> It looks like I have two options though.
>
> A) Do a state machine that interprets each input byte and outputs 0
> or more output bytes (e.g. read up arrow \033]A; -> output \007prompt>
> cmd). A full-blown state machine is necessary for this because even
> ignoring a sequence still requires knowing it's length.
>
> B) Use the number of bytes returned by read as a hint to assist with
> interpreting the input. For example if read returns 3 and the sequence
> is of no interest then I can just ignore and read the next seq.
>
> The second approach is easier but is it considered sound? Can I be
> guaranteed that one and only one sequence will be returned with each
> call to read?


No it isn't sound. In buffered mode, read returns at most a line or
the size requested, whichever is smaller. In unbuffered mode, read
returns what bytes are available or the size requested, whichever is
smaller. If you cal read when two bytes are received, before the
third arrives, you'll get only two bytes.

Think about it a little please! If the bytes come from a keyboard, or
from a serial line, you get speeds on the order of 1000 byte/second; a
3-byte sequence takes 0.003 second = 3000000 ns to arrive. Your 1GHz
computer can execute 3 million instructions during that time!


> Otherwise is there a method to deterministically find the length of an
> arbitrary terminal input sequence?


It depends on the terminal class (check the TERM environment variable).
See for example: http://aspell.net/charsets/iso6429.html
There is a syntax for the code sequences, so indeed a state machine is
the way to go.


--
"Indentation! -- I will show you how to indent when I indent your skull!"
Gordon Burditt

2005-08-16, 5:59 pm

>It looks like I have two options though.
>
> A) Do a state machine that interprets each input byte and outputs 0
> or more output bytes (e.g. read up arrow \033]A; -> output \007prompt>
> cmd). A full-blown state machine is necessary for this because even
> ignoring a sequence still requires knowing it's length.
>
> B) Use the number of bytes returned by read as a hint to assist with
> interpreting the input. For example if read returns 3 and the sequence
> is of no interest then I can just ignore and read the next seq.
>
>The second approach is easier but is it considered sound? Can I be
>guaranteed that one and only one sequence will be returned with each
>call to read?


Absolutely not. Especially if this is going over a serial cable.

>Otherwise is there a method to deterministically find the length of an
>arbitrary terminal input sequence?


Only if you know all the sequences that can be generated. And sometimes
not even then. Take, for example, the example of the vi editor.
Arrow keys on vt100-like terminals generate sequences like:
esc [ A
esc [ B
esc [ C
esc [ D
*BUT* the user can also type an esc by itself. Now, how do you determine
what was typed? You have to use timing, but over a network, timing is
not guaranteed and eventually it will mess up. If there's an esc and
a long pause, chances are it's a single esc. If there's an esc and it's
not followed by [, chances are it's a single esc. If there's an esc and
a [ and a letter, all close together, chances are it's an arrow or function
key. But if you like to hold down one of the arrow keys using auto-repeat,
sooner or later it WILL mess up.

ANSI standard escape sequences provide for some general pattern,
such as:
an escape sequence consists of an esc character, followed by
the '[' character, followed by 0 or more characters in the
0x20 - 0x3f range, followed by 1 character in the 0x40-0x7f range,
which is the end of the sequence.

It's more complicated than that, especially with 8-bit character sets
as opposed to 7-bit ASCII. And a user manually using the ESC key
messes the whole thing up. And this only applies to serial terminals
with ANSI standard escape sequences.

Gordon L. Burditt

Floyd L. Davidson

2005-08-16, 5:59 pm

Michael B Allen <mba2000@ioplex.com> wrote:
>On Tue, 16 Aug 2005 02:02:02 +0200, Pascal Bourguignon wrote:
>
>It looks like I have two options though.


Rather than re-inventing the wheel, why not download the sources
for ncurses and see how others have managed the same problem.

The tricky part is timing the receipt of characters in order to
distinguish between keyboard function key sequences and the
identical sequence typed by hand.

That in itself is complex, but the real trouble is fine tuning it
well enough that it works for not just a console keyboard, but
also for a keyboard on a terminal connected via a 1200 bps
serial port! (That is one you might give up on, and settle for
function keys that work only at a minimum bit rate of 9600 bps
or 19.2Kbps.) It is a tradeoff and a fast typist will have
problems with timing that suits an hunt and peck typist fine.
Ncurses includes a way to modify the timing as needed (with an
environment variable if I remember correctly).

--
Floyd L. Davidson <http://www.apaflo.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska) floyd@apaflo.com
Thomas Dickey

2005-08-16, 5:59 pm

Floyd L. Davidson <floyd@apaflo.com> wrote:

> Ncurses includes a way to modify the timing as needed (with an
> environment variable if I remember correctly).


ESCDELAY (which is a global variable, and is initialized from the
environment variable with the same name).

--
Thomas E. Dickey
http://invisible-island.net
ftp://invisible-island.net
Michael B Allen

2005-08-16, 8:48 pm

On Tue, 16 Aug 2005 18:00:54 +0000, Gordon Burditt wrote:

>
> Only if you know all the sequences that can be generated. And sometimes
> not even then. Take, for example, the example of the vi editor.
> Arrow keys on vt100-like terminals generate sequences like:
> esc [ A
> esc [ B
> esc [ C
> esc [ D
> *BUT* the user can also type an esc by itself. Now, how do you determine
> what was typed? You have to use timing, but over a network, timing is
> not guaranteed and eventually it will mess up. If there's an esc and
> a long pause, chances are it's a single esc. If there's an esc and it's
> not followed by [, chances are it's a single esc. If there's an esc and
> a [ and a letter, all close together, chances are it's an arrow or function
> key. But if you like to hold down one of the arrow keys using auto-repeat,
> sooner or later it WILL mess up.


Very helpful Gordon. Thanks.

I went with a fairly pure but minimalistic state machine. With only a
handful of states I can read (ignore rather) all keys on my keyboard
cleanly. Then between script and hexdump I was able to reimplement
canonical mode without too much trouble. It was actually quite fun. It
would make a nice student project actually.

Anyway, I've done a few things in anticipation of the aforementioned
issues. Basically if there's an ESC in state 0 I set read to timeout
after 20ms and try to get another char. If it timesout I treat it as a
literal ESC. If it does not timeout and the next char is '[' or 'O' ('O'
appears to be used for some function keys) I move to the next state.
Otherwise, I "unget" the char and return a literal ESC. If I get an
unknown sequence in one of the later states I just reset to state 0 and
try again (and should probably clear inbuf too maybe).

The state machine is listed at the end of this post. If anyone sees any
obvious improvements I can make please jump in.

Of course this will probably choke in some cases but hopefully I can make
it robust enough. I have a few other machines I can try it on. Also, I
can try piping large chunks of input with UTF-8 characters and such. It
would be interesting to see it works in telnet. Anyone know how to
sabotage Xterm (or linux pty perhaps) so that it sends input erratically?

Thanks,
Mike

static int
inbuf_readtok(struct termbuf *tb)
{
unsigned char ch;
int state;
struct timeval t20ms = { 0, 20000 }, *timeout;

again:
state = 0;
si = 0;
timeout = NULL;

while (1) {
switch (inbuf_getc(tb, &ch, timeout)) {
case -1:
return -1;
case -2: /* timeout */
return TOK_ESCAPE;
case 0:
return 0;
}
timeout = NULL;

switch (state) {
case 0:
if (isprint(ch) || ch > 0x7F) {
return ch & 0xFF;
}
switch (ch) {
case '\n':
return TOK_ENTER;
case '\b':
return TOK_BACKSPACE;
case '\t':
return TOK_TAB;
case 033: /* ESC */
timeout = &t20ms;
state = 1;
break;
default:
/* unsupported sequence (aka ignore) */
goto again;
}
break;
case 1:
if (ch == '[' || ch == 'O') {
state = 2;
break;
}
/* maybe not escape sequence after all */
inbuf_ungetc(tb, ch);
return TOK_ESCAPE;
case 2:
switch (ch) {
case '1':
case '2':
state = 3;
break;
case '3':
case '5':
case '6':
state = 4;
break;
case 'A':
return TOK_UP;
case 'B':
return TOK_DOWN;
case 'C':
return TOK_RIGHT;
case 'D':
return TOK_LEFT;
default:
/* unsupported sequence (aka ignore) */
goto again;
}
break;
case 3:
if (ch >= '0' && ch <= '9') {
state = 4;
break;
}
/* unsupported sequence (aka ignore) */
goto again;
case 4:
/* unsupported sequence (aka ignore) */
goto again;
break;
}
}

return -1;
}

Pascal Bourguignon

2005-08-17, 3:06 am

Michael B Allen <mba2000@ioplex.com> writes:
> Of course this will probably choke in some cases but hopefully I can make
> it robust enough. I have a few other machines I can try it on. Also, I
> can try piping large chunks of input with UTF-8 characters and such. It
> would be interesting to see it works in telnet. Anyone know how to
> sabotage Xterm (or linux pty perhaps) so that it sends input erratically?


The problem is not su much erratical input as the fact that you don't
check for the actual terminal model (using the TERM environment
variable), and use a different state machine for a different kind of
terminal.

--
__Pascal Bourguignon__ http://www.informatimago.com/
Until real software engineering is developed, the next best practice
is to develop with a dynamic system that has extreme late binding in
all aspects. The first system to really do this in an important way
is Lisp. -- Alan Kay
Michael B Allen

2005-08-17, 5:59 pm

On Wed, 17 Aug 2005 07:47:27 +0200, Pascal Bourguignon wrote:
>
> The problem is not so much erratical input as the fact that you don't
> check for the actual terminal model (using the TERM environment
> variable), and use a different state machine for a different kind of
> terminal.


Good point.

But I think that FSM should work with all ANSI terminals. So for all other
terminals I suppose my "readline" function can simply not turn of the
builtin canonical mode and basically revert to standard line editing. All
I need is a list of TERM environment variables that are considered ANSI.

Mike

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com