Unix Shell - Safe reading of list of files in bash

This is Interesting: Free IT Magazines  
Home > Archive > Unix Shell > February 2007 > Safe reading of list of files in bash





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author Safe reading of list of files in bash
Volodymyr M. Lisivka

2007-02-20, 1:17 pm

I need way to safely a correctly read sorted list of files in bash3.0
(Fedora/RHEL/CentOS 4 and later).

I created sample list of files for testing:

$ ls -Q1 *
"*"
"a\\"
"a\" ad"
"a 'asd"
"a b"
"a\\ s"
"asd$$"
"asd\\\\"
"asd ` sdasd"
"a\\td"
"c\n dd"

"a ":
"as "

Simple cycle "for" works fine in situation like that:

$ declare -a A=(); for FILE in *; do A[${#A[@]}]="$FILE"; done ; for I
in "${A[@]}"; do [ -e "$I" ] || echo "\"$I\" is not found."; done

However, I need to sort list of files by name (it is important). Files
are created by users, which are out of my control so they can create
files with any name, even incorrect. (They already created few files
with spaces, quotes and Unicode characters).

I has two versions of script. First:

$ declare -a A=(); ls -d --escape * >/tmp/tmpfile; while read FILE; do
A[${#A[@]}]="$FILE"; done </tmp/tmpfile; for I in "${A[@]}"; do [ -e
"$I" ] || echo "\"$I\" is not found."; done; rm /tmp/tmpfile

"cn dd" is not found.


Second:

$ declare -a A=(); ls -dQ * >/tmp/tmpfile; while read -r FILE; do
FILE="${FILE%\"}"; FILE="${FILE/\"}"; A[${#A[@]}]="`printf "%b"
"$FILE"`"; done </tmp/tmpfile; for I in "${A[@]}"; do [ -e "$I" ] ||
echo "\"$I\" is not found."; done; rm /tmp/tmpfile

"a\" ad" is not found.


Both scripts fails to read correctly all file names.

First version seems to be better for me.

PS.
I use temporary file instead of <( ... ) because I run script in VServer
environment.
Thomas J.

2007-02-20, 1:17 pm

On 20 Feb., 17:53, "Volodymyr M. Lisivka" <vlisi...@gmail.com> wrote:
>
> $ ls -Q1 *
> "*"
> "a\\"
> "a\" ad"
> "a 'asd"
> "a b"
> "a\\ s"
> "asd$$"
> "asd\\\\"
> "asd ` sdasd"
> "a\\td"
> "c\n dd"
>
> "a ":
> "as "
>


Why not:

for i in *; do echo "$i"; done

Thomas

Chris F.A. Johnson

2007-02-20, 7:16 pm

On 2007-02-20, Volodymyr M. Lisivka wrote:
> I need way to safely a correctly read sorted list of files in bash3.0
> (Fedora/RHEL/CentOS 4 and later).
>
> I created sample list of files for testing:
>
> $ ls -Q1 *
> "*"
> "a\\"
> "a\" ad"
> "a 'asd"
> "a b"
> "a\\ s"
> "asd$$"
> "asd\\\\"
> "asd ` sdasd"
> "a\\td"
> "c\n dd"
>
> "a ":
> "as "
>
> Simple cycle "for" works fine in situation like that:
>
> $ declare -a A=(); for FILE in *; do A[${#A[@]}]="$FILE"; done ; for I
> in "${A[@]}"; do [ -e "$I" ] || echo "\"$I\" is not found."; done


When posting scripts, please format them so that they are
legible and do not wrap inappropriately:

> declare -a A=()
> for FILE in *
> do
> A[${#A[@]}]="$FILE"
> done


That can be done without a loop:

A=( * )

> for I in "${A[@]}"
> do
> [ -e "$I" ] || echo "\"$I\" is not found."
> done


> However, I need to sort list of files by name (it is important).


The files will be sorted alphabetically by the shell when it
expands the wildcard.

> Files are created by users, which are out of my control so they can
> create files with any name, even incorrect. (They already created
> few files with spaces, quotes and Unicode characters).
>
> I has two versions of script. First:
>
> declare -a A=()
> ls -d --escape * >/tmp/tmpfile
> while read FILE; do
> A[${#A[@]}]="$FILE"
> done </tmp/tmpfile


Why are you using a file instead of a pipe?

ls -d | while IFS= read -r FILE

> for I in "${A[@]}"; do
> [ -e "$I" ] || echo "\"$I\" is not found."
> done
> rm /tmp/tmpfile
>
> "cn dd" is not found.
>
>
> Second:
>
> declare -a A=()
> ls -dQ * >/tmp/tmpfile
> while read -r FILE; do
> FILE="${FILE%\"}"
> FILE="${FILE/\"}"
> A[${#A[@]}]="`printf "%b" "$FILE"`"
> done </tmp/tmpfile
> for I in "${A[@]}"; do
> [ -e "$I" ] || echo "\"$I\" is not found."
> done
> rm /tmp/tmpfile
>
> "a\" ad" is not found.
>
>
> Both scripts fails to read correctly all file names.


You are making it far more complicated than it need be. Just use
filename expansion:

for FILE in *
do
....
done

> First version seems to be better for me.
>
> PS.
> I use temporary file instead of <( ... ) because I run script in VServer
> environment.


Why do either? Your problem is ls; it is not necessary.

--
Chris F.A. Johnson, author <http://cfaj.freeshell.org/shell>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence
Volodymyr M. Lisivka

2007-02-21, 7:26 am

Chris F.A. Johnson wrote:

>
> When posting scripts, please format them so that they are
> legible and do not wrap inappropriately:


OK. But I prefer to use one-liners as examples because they can be
pasted executed directly from command line.

>
> The files will be sorted alphabetically by the shell when it
> expands the wildcard.


I missed that. :-(

However it will not help me when I need to sort files by date.

At present time, files has names with date included:
"prefix-YYYY-MM-DD.xml.gz", so I rely on alphabetical sorting. However,
this can be changed in the future, so script will need to sort files by
custom criteria. I will not be able to return and rewrite script, so it
need to be configurable.

BTW: Shell will use current locale for collating:

$ echo *
A a B b C c d

$ LANG=C
$ echo *
A B C a b c d

$ LANG=en_US
$ echo *
a A b B c C d


>
> Why are you using a file instead of a pipe?
>
> ls -d | while IFS= read -r FILE


I need to store result of execution in variables. Pipe will be executed
in subshell:

$ VAR=""; echo FOO | while read ; do echo "1: $REPLY"; VAR="$REPLY";
done; echo "2: $VAR";
1: FOO
2:

$ VAR=""; echo FOO >/tmp/tmpfile; while read ; do echo "1: $REPLY";
VAR="$REPLY"; done </tmp/tmpfile; echo "2: $VAR";
1: FOO
2: FOO

so I will cannot pass result of execution to outside part of script.


> You are making it far more complicated than it need be. Just use
> filename expansion:
>
> for FILE in *
> do
> ....
> done


I cannot.
Chris F.A. Johnson

2007-02-21, 1:18 pm

On 2007-02-21, Volodymyr M. Lisivka wrote:
> Chris F.A. Johnson wrote:
>
>
> OK. But I prefer to use one-liners as examples because they can be
> pasted executed directly from command line.


You can paste and execute multi-line scripts just as easily.

>
> I missed that. :-(
>
> However it will not help me when I need to sort files by date.


It really helps if you supply all the necessary information when
you post the question.

> At present time, files has names with date included:
> "prefix-YYYY-MM-DD.xml.gz", so I rely on alphabetical sorting. However,
> this can be changed in the future, so script will need to sort files by
> custom criteria. I will not be able to return and rewrite script, so it
> need to be configurable.
>
> BTW: Shell will use current locale for collating:
>
> $ echo *
> A a B b C c d
>
> $ LANG=C
> $ echo *
> A B C a b c d
>
> $ LANG=en_US
> $ echo *
> a A b B c C d


Set the locale you want in the script.

>
> I need to store result of execution in variables. Pipe will be executed
> in subshell:
>
> $ VAR=""; echo FOO | while read ; do echo "1: $REPLY"; VAR="$REPLY";
> done; echo "2: $VAR";
> 1: FOO
> 2:
>
> $ VAR=""; echo FOO >/tmp/tmpfile; while read ; do echo "1: $REPLY";
> VAR="$REPLY"; done </tmp/tmpfile; echo "2: $VAR";
> 1: FOO
> 2: FOO
>
> so I will cannot pass result of execution to outside part of script.


You can if you use braces:

command arg | {
while IFS= read -r line
do
x=$line
done

## code that needs variables read in the loop go here
printf "%s\n" "$x"
}

>
> I cannot.



--
Chris F.A. Johnson, author <http://cfaj.freeshell.org/shell>
Shell Scripting Recipes: A Problem-Solution Approach (2005, Apress)
===== My code in this post, if any, assumes the POSIX locale
===== and is released under the GNU General Public Licence
Volodymyr M. Lisivka

2007-02-21, 1:18 pm

Chris F.A. Johnson wrote:

>
> It really helps if you supply all the necessary information when
> you post the question.
>


I need way to safely and correctly sort list of files by custom criteria
in bash3.0 (Fedora/RHEL/CentOS 4 and later).

I have situation when users uploads files with data to our SFTP server.
I need to import these data files into the database. Order of import is
important: files must be imported in same order as they are uploaded.

At present time, files has names with date included:
"prefix-YYYY-MM-DD.xml.gz", so I rely on alphabetical sorting. However,
this can be changed in the future, so script really need to sort files
by a custom criteria.

I will not be able to return and rewrite script, so it need to be
configurable.

At other side, I worry about security. I prefer to write scripts like
that in PERL with "-T" (taint) switch, but in this case I have no
choice. I worry about hacker, which can upload file with special
characters in file name and hack my script. By example, he can embed
new line character into name, thus if script will use "\n" as record
separator, then script will see two or more files.

Example:

$ ls -Q1
"a\n b"
^-- file with embedded new line character in the name

$ ls * | while IFS= read -r ; do echo "\"$REPLY\""; done
"a"
" b"
^^^^- two file names

Or he can add backslash character to the end of file to append two file
names into one. Example:

$ ls -Q1
"a\\" # File has backslash at the end of name
"b"

$ ls * | while IFS= read ; do echo "\"$REPLY\""; done
"ab"
^^^^- single file name

Using both techniques it can bypass my restrictions.


So my question is:

How to pass list of files with special characters in names (like \n, ',
", \, $, `, " ", etc.) to sorting program and read them back safely and
correctly?
Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2009 webservertalk.com