Unix administration - [procmail] [sed] [awk] cleaning up mail headers

This is Interesting: Free IT Magazines  
Home > Archive > Unix administration > June 2004 > [procmail] [sed] [awk] cleaning up mail headers





You are viewing an archived Text-only version of the thread. To view this thread in it's original format and/or if you want to reply to this thread please [click here]

Author [procmail] [sed] [awk] cleaning up mail headers
Troy Piggins

2004-06-06, 7:50 am

Not quite sure where to post this ...

I am trying to write a command line for procmail that takes the From:
header field from an email and cleans it up into just the email address.
Reason being that depending on the sender, the From: field could be in
any of the following formats :

First Last <user@domain.com>
"First Last" <user@domain.com>
user@domain.com
first.last@domain.com.au

and there could be others.
I want to use just the email address (eg user@domain.com or possibly
first.last@domain.com.au) to grep my ~/.aliases file to check if I am
getting mail from someone I trust (whitelist).
My .aliases file is in mutt's format :

alias nickname1 First1 Last1 <user1@domain1.com>
alias nickname2 First2 Last2 <user2@domain2.com>
....

At present I use the following rule, with list.white being just a
manually edited list of emails only :

:0:
* ? formail -x "From:" -x "From" -x "Sender:" \
| egrep -i -is -f $PROCMAILDIR/list.white
white

But (I guess) I want something like this :

:0:
* ? formail -x "From:" -x "From" -x "Sender:" | $CLEAN_FROM_SCRIPT \
| egrep -i -is -f ~/.aliases
white

and I envisaged $CLEAN_FROM_SCRIPT being a sed or awk commandline, but
could be bash script. I need help with that part of it.
Any suggestions?
Thanks.
--
T R O Y P I G G I N S
e : troy@piggo.com
William Park

2004-06-06, 4:50 pm

In <comp.mail.misc> Troy Piggins <troy@piggo.com> wrote:
> Not quite sure where to post this ...
>
> I am trying to write a command line for procmail that takes the From:
> header field from an email and cleans it up into just the email address.
> Reason being that depending on the sender, the From: field could be in
> any of the following formats :
>
> First Last <user@domain.com>
> "First Last" <user@domain.com>
> user@domain.com
> first.last@domain.com.au
>
> and there could be others.
> I want to use just the email address (eg user@domain.com or possibly
> first.last@domain.com.au) to grep my ~/.aliases file to check if I am
> getting mail from someone I trust (whitelist).
> My .aliases file is in mutt's format :
>
> alias nickname1 First1 Last1 <user1@domain1.com>
> alias nickname2 First2 Last2 <user2@domain2.com>
> ...
>
> At present I use the following rule, with list.white being just a
> manually edited list of emails only :
>
> :0:
> * ? formail -x "From:" -x "From" -x "Sender:" \
> | egrep -i -is -f $PROCMAILDIR/list.white
> white
>
> But (I guess) I want something like this :
>
> :0:
> * ? formail -x "From:" -x "From" -x "Sender:" | $CLEAN_FROM_SCRIPT \
> | egrep -i -is -f ~/.aliases
> white
>
> and I envisaged $CLEAN_FROM_SCRIPT being a sed or awk commandline, but
> could be bash script. I need help with that part of it.
> Any suggestions?
> Thanks.


1. How about
[a-z0-9_.-]+@[a-z0-9_.-]+
as email address pattern?

2. You should search ~/.aliases for a particular email address, not the
other way around. That is, do
egrep 'user@domain.com' ~/.aliases
and not
echo 'user@domain.com' | egrep -f ~/.aliases

3. As last resort, you can parse ~/.aliases runtime into
ALIASES = '(user1@domain1.com|user2@domain2.com|...)'
to be used as condition in a form of
* ^From:.*(...|...|...)

--
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
No, I will not fix your computer! I'll reformat your harddisk, though.
Allodoxaphobia

2004-06-06, 11:50 pm

On Sun, 06 Jun 2004 21:26:21 +1000, Troy Piggins hath writ:
> Not quite sure where to post this ...
>
> I am trying to write a command line for procmail that takes the From:
> header field from an email and cleans it up into just the email address.
> Reason being that depending on the sender, the From: field could be in
> any of the following formats :
>
> First Last <user@domain.com>
> "First Last" <user@domain.com>
> user@domain.com
> first.last@domain.com.au
>
> and there could be others.
> I want to use just the email address (eg user@domain.com or possibly
> first.last@domain.com.au) to grep my ~/.aliases file to check if I am
> getting mail from someone I trust (whitelist).


I've just now been hacking in this area...

Try:
formail -rtzxTo:

as found in:

http://www.xray.mpe.mpg.de/mailing-...5/msg00315.html

HTH,
Jonesy
--
| Marvin L Jones | jonz | W3DHJ | OS/2
| Gunnison, Colorado | @ | Jonesy | linux __
| 7,703' -- 2,345m | frontier.net | DM68mn SK
Alan Connor

2004-06-06, 11:50 pm

On Sun, 06 Jun 2004 21:26:21 +1000, Troy Piggins <troy@piggo.com> wrote:
>
>
> Not quite sure where to post this ...
>
> I am trying to write a command line for procmail that takes the From:
> header field from an email and cleans it up into just the email address.
> Reason being that depending on the sender, the From: field could be in
> any of the following formats :
>
> First Last <user@domain.com>
> "First Last" <user@domain.com>
> user@domain.com
> first.last@domain.com.au
>
> and there could be others.
> I want to use just the email address (eg user@domain.com or possibly
> first.last@domain.com.au) to grep my ~/.aliases file to check if I am
> getting mail from someone I trust (whitelist).
> My .aliases file is in mutt's format :
>
> alias nickname1 First1 Last1 <user1@domain1.com>
> alias nickname2 First2 Last2 <user2@domain2.com>
> ...
>
> At present I use the following rule, with list.white being just a
> manually edited list of emails only :
>
>:0:
> * ? formail -x "From:" -x "From" -x "Sender:" \
> | egrep -i -is -f $PROCMAILDIR/list.white
> white
>
> But (I guess) I want something like this :
>
>:0:
> * ? formail -x "From:" -x "From" -x "Sender:" | $CLEAN_FROM_SCRIPT \
> | egrep -i -is -f ~/.aliases
> white
>
> and I envisaged $CLEAN_FROM_SCRIPT being a sed or awk commandline, but
> could be bash script. I need help with that part of it.
> Any suggestions?
> Thanks.


Hi Troy,

The nutters are out in force today, aren't they :-)

(I use passlist/blocklist instead of whitelist/blacklist, because the
latter sound racist to me.)

Couple of thoughts:

An effective passlist contains a lot more than just one's trusted
friends. It should include *anyone* you've sent a mail to, and
you aren't going to create an alias for most of those.

They can also be more complex than simply a return address. Sometimes,
you will not know what address the mail will be returned from. Mail
to any large organization can be like that, where it will be handed
from department to department and may end up being returned from the
home of an employee there...

In that case, it is useful to passlist the Subject: line, and have
something like this at the top of the mail:

|The Subject of this mail is a password and needs to be included in
|any reply, unchanged except for Re: (one or more) and whitespace(s).
|Thank you.

For mailing lists, it is often the Return-Path: you passlist, or a
special header from the listserver.

That being said....

(I'm assuming here that you call fetchmail which then calls
procmail...)

It's a lot easier to parse your_alias_file than the incoming headers,
so do that before fetchmail does its thing, with a script:

#!/bin/bash
# /usr/local/bin/alparse

sed -e 's/^\(.*<\)\(.*\)\(>.*\)/\2|\\/' \
-e '$s/|\\//' -e 's/\./\\./g' \
/home/you/your_alias_file > /home/you/newalias

To make it all automatic, alias (shell alias) fetchmail like so:

alias fetchmail='alparse && fetchmail'

So whenever you call fetchmail, the first thing that happens is
that your_alias_file is converted to something procmail can deal
with, and you can add and subract from it without worrying about
having to do anything else. The old newalias will be overwritten
each time you retrieve your mail.

This goes at the top of your .procmailrc:

ALIAS=`cat /home/you/newalias`

then

:0:
* $ ^(From|From:|Sender:|Reply-To:|Return-Path.*${ALIAS}
pass

AC

--
Pass-List -----> Block-List ----> Challenge-Response
The key to taking control of your mailbox. Design Parameters:
http://tinyurl.com/2t5kp || http://tinyurl.com/3c3ag
Challenge-Response links -- http://tinyurl.com/yrfjb


Troy Piggins

2004-06-07, 2:50 am

Allodoxaphobia wrote:

> I've just now been hacking in this area...
>
> Try:
> formail -rtzxTo:
>
> as found in:
>
> http://www.xray.mpe.mpg.de/mailing-...5/msg00315.html
>
> HTH,
> Jonesy


Thanks, Jonesy. That formail line did not do what I wanted, but I
followed that link you gave. The archived post from Nancy McGough *did*
contain a sed script along the lines of what I want. She was using it
to create a whitelist from her addressbook :

cat $HOME/Msgs/AddressBook* \
|fgrep "@" \
|sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
-e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
|sort -fu \
> $HOME/Procmail/whitelist.tmp


I thought I could add it into where I had mentioned $CLEAN_FROM_SCRIPT
like so :

:0:
* ? formail -x "From:" -x "From" -x "Sender:" \
|sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
-e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
| egrep -i -is -f $HOME/.aliases
white.test

Unfortunately this is not working (the mail falls through to one of my
other recipes). The log gives this error :

procmail: Executing " formail -x "From:" -x "From" -x "Sender:" | sed -e
"s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
$HOME/.aliases"
grep: Trailing backslash
procmail: Non-zero exitcode (2) from " formail -x "From:" -x "From" -x
"Sender:" | sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
$HOME/.aliases"
procmail: No match on " formail -x "From:" -x "From" -x "Sender:" | sed
-e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
$HOME/.aliases"

When I run this on the commandline I get the clean address I want :

[troy@linus:~]$ echo "\"Troy Piggins\" <troy@piggo.com>" |sed -e
"s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"

the output is :

troy@piggo.com

which is exactly the result I want.
I reckon the grep trailing backslash is playing up, but all the sed
scripts are confusing me with all the //\\/\/\/ etc.
I am missing something, but can't see it.
--
T R O Y P I G G I N S
e : troy@piggo.com
Troy Piggins

2004-06-07, 2:50 am

Alan Connor wrote:

> Couple of thoughts:
>
> An effective passlist contains a lot more than just one's trusted
> friends. It should include *anyone* you've sent a mail to, and
> you aren't going to create an alias for most of those.


True - I am just using this as a starting point, then once I get that
working I can build from there.

> (I'm assuming here that you call fetchmail which then calls
> procmail...)


yep.

> It's a lot easier to parse your_alias_file than the incoming headers,
> so do that before fetchmail does its thing, with a script:
>
> #!/bin/bash
> # /usr/local/bin/alparse
>
> sed -e 's/^\(.*<\)\(.*\)\(>.*\)/\2|\\/' \
> -e '$s/|\\//' -e 's/\./\\./g' \
> /home/you/your_alias_file > /home/you/newalias
>
> To make it all automatic, alias (shell alias) fetchmail like so:
>
> alias fetchmail='alparse && fetchmail'
>
> So whenever you call fetchmail, the first thing that happens is
> that your_alias_file is converted to something procmail can deal
> with, and you can add and subract from it without worrying about
> having to do anything else. The old newalias will be overwritten
> each time you retrieve your mail.


Fair enough. I was trying to avoid creating a new temp file every time
I check mail. Would have thought this is unnecessary load with reading,
parsing, writing files every time fetchmail called.
That is why I was trying to use sed inline in the recipe.
See my response to Allodox's post for my attempt at solution.

> This goes at the top of your .procmailrc:
>
> ALIAS=`cat /home/you/newalias`
>
> then
>
> :0:
> * $ ^(From|From:|Sender:|Reply-To:|Return-Path.*${ALIAS}
> pass
>
> AC


--
T R O Y P I G G I N S
e : troy@piggo.com
Alan Connor

2004-06-07, 11:53 pm

On Mon, 07 Jun 2004 06:05:58 GMT, Troy Piggins <troy@piggo.com> wrote:
>
>
> Allodoxaphobia wrote:
>
>
> Thanks, Jonesy. That formail line did not do what I wanted, but I
> followed that link you gave. The archived post from Nancy McGough *did*
> contain a sed script along the lines of what I want. She was using it
> to create a whitelist from her addressbook :
>
> cat $HOME/Msgs/AddressBook* \
> |fgrep "@" \
> |sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
> -e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
> |sort -fu \
>
> I thought I could add it into where I had mentioned $CLEAN_FROM_SCRIPT
> like so :
>
>:0:
> * ? formail -x "From:" -x "From" -x "Sender:" \
> |sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
> -e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
> | egrep -i -is -f $HOME/.aliases
> white.test
>
> Unfortunately this is not working (the mail falls through to one of my
> other recipes). The log gives this error :
>
> procmail: Executing " formail -x "From:" -x "From" -x "Sender:" | sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
> $HOME/.aliases"
> grep: Trailing backslash
> procmail: Non-zero exitcode (2) from " formail -x "From:" -x "From" -x
> "Sender:" | sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
> $HOME/.aliases"
> procmail: No match on " formail -x "From:" -x "From" -x "Sender:" | sed
> -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
> $HOME/.aliases"
>
> When I run this on the commandline I get the clean address I want :
>
> [troy@linus:~]$ echo "\"Troy Piggins\" <troy@piggo.com>" |sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"
>
> the output is :
>
> troy@piggo.com
>
> which is exactly the result I want.
> I reckon the grep trailing backslash is playing up, but all the sed
> scripts are confusing me with all the //\\/\/\/ etc.
> I am missing something, but can't see it.


I have run into the same sort of problem, many times, which is why I
use the solution I presented in my post. Procmail can be very finnicky
and has trouble with complex scripts in the rc file itself.

It's usually better to put the script elsewhere and call it or pipe
through it from the procmailrc.

A log entry that reads: "program failure of script3" is a lot easier
to interpret than that mess above, and you know exactly where the
problem is.

On another level, you have to allow for a lot of variation in the
contents of those headers, sedscripting for all of them, and it is
much simpler to let procmail egrep them for a string.

Keep us posted. :-)


AC


--
Pass-List -----> Block-List ----> Challenge-Response
The key to taking control of your mailbox. Design Parameters:
http://tinyurl.com/2t5kp || http://tinyurl.com/3c3ag
Challenge-Response links -- http://tinyurl.com/yrfjb
Troy Piggins

2004-06-07, 11:53 pm

Alan Connor wrote:

> Keep us posted. :-)


Well, I think I have done it - this seems to work for a couple of test
emails :

WHITELIST=$HOME/.aliases
CLEAN_FROM=`formail -x "From:" -x "From" -x "Sender:" | sed -e
"s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"`

:0:
* ? cat $WHITELIST | fgrep -is $CLEAN_FROM
white

I am happy.
--
T R O Y P I G G I N S
e : troy@piggo.com
Alan Connor

2004-06-07, 11:53 pm

On Mon, 07 Jun 2004 09:20:31 GMT, Troy Piggins <troy@piggo.com> wrote:
>
>
> Alan Connor wrote:
>
>
> Well, I think I have done it - this seems to work for a couple of test
> emails :
>
> WHITELIST=$HOME/.aliases
> CLEAN_FROM=`formail -x "From:" -x "From" -x "Sender:" | sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"`
>
>:0:
> * ? cat $WHITELIST | fgrep -is $CLEAN_FROM
> white
>
> I am happy.



Great. I'll tuck this in my procmail docs.

AC

William Park

2004-06-07, 11:53 pm

In <comp.editors> Troy Piggins <troy@piggo.com> wrote:
> Alan Connor wrote:
>
>
> Well, I think I have done it - this seems to work for a couple of test
> emails :
>
> WHITELIST=$HOME/.aliases
> CLEAN_FROM=`formail -x "From:" -x "From" -x "Sender:" | sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"`
>
> :0:
> * ? cat $WHITELIST | fgrep -is $CLEAN_FROM
> white
>
> I am happy.


Try this one...

:0
* ^From:.*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
* ? grep -i "$MATCH" ~/.aliases
white

--
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
No, I will not fix your computer! I'll reformat your harddisk, though.
Alan Connor

2004-06-07, 11:53 pm

On 7 Jun 2004 15:00:52 GMT, William Park <opengeometry@yahoo.ca> wrote:
>
>
> In <comp.editors> Troy Piggins <troy@piggo.com> wrote:
>
> Try this one...
>
> :0
> * ^From:.*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
> * ? grep -i "$MATCH" ~/.aliases
> white
>


Nice.

AC


Troy Piggins

2004-06-07, 11:53 pm

William Park wrote:

> Try this one...
>
> :0
> * ^From:.*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
> * ? grep -i "$MATCH" ~/.aliases
> white


Aww, why do I have to always look for the most complicated solution :-(
Me thinks I like yours much better.
I thank you.
--
T R O Y P I G G I N S
e : troy@piggo.com
Alan Connor

2004-06-08, 3:24 am

On 7 Jun 2004 15:00:52 GMT, William Park <opengeometry@yahoo.ca> wrote:
>
>
> In <comp.editors> Troy Piggins <troy@piggo.com> wrote:
>


Been playing with this slick recipe:

> Try this one...
>
> :0
> * ^From:.*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
> * ? grep -i "$MATCH" ~/.aliases




For some of those addresses, you are going to want to
cross-reference some other header - say a spammer was
using one of them, or a troll was trying to harass you
by forging one of them.


:0:
* ^(From:|From|Sender:|Reply-to:|Return-Path\
..*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
* ? grep -i "$MATCH" ~/.aliases
{

:0
* ^(From:|From|Sender:|Reply-to:|Return-Path.*bill@him.net
* ! ^Received:.*orin.mit.edu
/dev/null

:0
pass

}

In the above case any mail that seems to come from bill is double-
checked to see if it came from the university where he has his
account. Could be a password on the Subject line or anything
else...


Now, for a *temporary* address pass list, use the same recipe and put
the addresses in a new list.

:0:
* ^(From:|From|Sender:|Reply-to:|Return-Path\
..*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
* ? grep -i "$MATCH" ~/.temppasslist
temppass

For Subject header passwords (which can be any Subject line)
yet another invocation. These can go in the same list with
the temporarily passlisted addresses.

:0:
* ^Subject: *(Re:|Re: )* *\/.*
* ? grep -i "$MATCH" ~/temppaslist
temppass

Next, a fourth application of the same recipe can be used for
an address blocklist:

:0
* ^(From:|From|Sender:|Reply-to:|Return-Path\
..*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
* ? grep -i "$MATCH" ~/.blocklist
/dev/null

(Subject blocklists are a fool's game.)

AC

--
Pass-List -----> Block-List ----> Challenge-Response
The key to taking control of your mailbox. Design Parameters:
http://tinyurl.com/2t5kp || http://tinyurl.com/3c3ag
Challenge-Response links -- http://tinyurl.com/yrfjb

Sponsored Links






Free braindumps | Software forum | Database administration forum

Copyright 2003 - 2008 webservertalk.com