|
Home > Archive > Unix Shell > March 2005 > Removing lines from file
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
Removing lines from file
|
|
| Jens P 2005-03-11, 5:59 pm |
| I have file A with x lines and file B with y lines, which are always also
found in A.
I.e. file B is a "subset" of A, and hence x is always larger than y.
I want to produce a new file where the lines in B have been "subtracted"
from A.
Each line in both files only contain 1 string...
I tried a recursive approach like this i Bourne shell:
for line in `cat B` ; do
cat A | sed '/$line/d' > A
done
But the file A is truncated after the first run! Why?
Is there a better (working) and more elegant approach using sed/awk?
Thanks for your help!
/Jens
| |
| Icarus Sparry 2005-03-11, 8:55 pm |
| On Wed, 09 Mar 2005 17:27:55 +0800, Jens P wrote:
> I have file A with x lines and file B with y lines, which are always also
> found in A.
> I.e. file B is a "subset" of A, and hence x is always larger than y.
> I want to produce a new file where the lines in B have been "subtracted"
> from A.
> Each line in both files only contain 1 string...
>
> I tried a recursive approach like this i Bourne shell:
>
> for line in `cat B` ; do
> cat A | sed '/$line/d' > A
> done
>
> But the file A is truncated after the first run! Why?
The output redirection truncates the file, and this is usually done before
the commands are run. Therefore 'cat A' will find no data. Newer versions
of GNU sed have a '-i' flag to edit 'inplace', the traditional approach is
to use a temporary file. The next problem is that you need to use " and not '
to surround your sed command as you want the value of $line in it, and not
the literal characters dollar l i n e.
for line in `cat B` ; do
sed "/$line/d" A > A.tmp
mv A.tmp A
done
> Is there a better (working) and more elegant approach using sed/awk?
Yes. If you have 100 lines in B, then you need to run 'sed' 100 times
with your approach. You can change this to 2, regardless of the number of
lines. First edit the B script to produce a list of commands, then run it
e.g.
sed 's/./\\&/g
s:.*:/^&$/d' B > B.cmd
sed -f B.cmd A > A.tmp
mv A.tmp A
The normal approach with awk is to read the B file, and use each line as
the index into an associative array, then read the A file, and only print
the line if it is not an index into the array.
| |
| Chris F.A. Johnson 2005-03-12, 2:48 am |
| On Wed, 09 Mar 2005 at 09:27 GMT, Jens P wrote:
> I have file A with x lines and file B with y lines, which are always also
> found in A.
> I.e. file B is a "subset" of A, and hence x is always larger than y.
> I want to produce a new file where the lines in B have been "subtracted"
> from A.
> Each line in both files only contain 1 string...
By "1 string", do you mean "1 word"? That is, there are no spaces
in the lines?
> I tried a recursive approach like this i Bourne shell:
>
> for line in `cat B` ; do
This is not a good method of reading a file line by line; it reads
the file word by word. Always use:
while read line ## If your shell has it, use: read -r line
do
: whatever....
done < B
> cat A | sed '/$line/d' > A
> done
>
> But the file A is truncated after the first run! Why?
Because redirecting to A truncates it. The shell performs the
redirection before the command is executed; there is no longer
anything there to cat.
Always send the output to a temproary file and move or copy it
back to the original.
> Is there a better (working) and more elegant approach using sed/awk?
grep -vf B A > tempfile
mv tempfile A
--
Chris F.A. Johnson http://cfaj.freeshell.org/shell
========================================
===========================
My code (if any) in this post is copyright 2005, Chris F.A. Johnson
and may be copied under the terms of the GNU General Public License
| |
| Bill Marcum 2005-03-12, 2:48 am |
| On Wed, 9 Mar 2005 17:27:55 +0800, Jens P
<jens.pedersen@NOSPAMericsson.com> wrote:
> I have file A with x lines and file B with y lines, which are always also
> found in A.
> I.e. file B is a "subset" of A, and hence x is always larger than y.
> I want to produce a new file where the lines in B have been "subtracted"
> from A.
man comm
|
|
|
|
|