|
Home > Archive > Unix administration > June 2004 > [procmail] [sed] [awk] cleaning up mail headers
You are viewing an archived Text-only version of the thread.
To view this thread in it's original format and/or if you want to reply to
this thread please [click here]
| Author |
[procmail] [sed] [awk] cleaning up mail headers
|
|
| Troy Piggins 2004-06-06, 7:50 am |
| Not quite sure where to post this ...
I am trying to write a command line for procmail that takes the From:
header field from an email and cleans it up into just the email address.
Reason being that depending on the sender, the From: field could be in
any of the following formats :
First Last <user@domain.com>
"First Last" <user@domain.com>
user@domain.com
first.last@domain.com.au
and there could be others.
I want to use just the email address (eg user@domain.com or possibly
first.last@domain.com.au) to grep my ~/.aliases file to check if I am
getting mail from someone I trust (whitelist).
My .aliases file is in mutt's format :
alias nickname1 First1 Last1 <user1@domain1.com>
alias nickname2 First2 Last2 <user2@domain2.com>
....
At present I use the following rule, with list.white being just a
manually edited list of emails only :
:0:
* ? formail -x "From:" -x "From" -x "Sender:" \
| egrep -i -is -f $PROCMAILDIR/list.white
white
But (I guess) I want something like this :
:0:
* ? formail -x "From:" -x "From" -x "Sender:" | $CLEAN_FROM_SCRIPT \
| egrep -i -is -f ~/.aliases
white
and I envisaged $CLEAN_FROM_SCRIPT being a sed or awk commandline, but
could be bash script. I need help with that part of it.
Any suggestions?
Thanks.
--
T R O Y P I G G I N S
e : troy@piggo.com
| |
| William Park 2004-06-06, 4:50 pm |
| In <comp.mail.misc> Troy Piggins <troy@piggo.com> wrote:
> Not quite sure where to post this ...
>
> I am trying to write a command line for procmail that takes the From:
> header field from an email and cleans it up into just the email address.
> Reason being that depending on the sender, the From: field could be in
> any of the following formats :
>
> First Last <user@domain.com>
> "First Last" <user@domain.com>
> user@domain.com
> first.last@domain.com.au
>
> and there could be others.
> I want to use just the email address (eg user@domain.com or possibly
> first.last@domain.com.au) to grep my ~/.aliases file to check if I am
> getting mail from someone I trust (whitelist).
> My .aliases file is in mutt's format :
>
> alias nickname1 First1 Last1 <user1@domain1.com>
> alias nickname2 First2 Last2 <user2@domain2.com>
> ...
>
> At present I use the following rule, with list.white being just a
> manually edited list of emails only :
>
> :0:
> * ? formail -x "From:" -x "From" -x "Sender:" \
> | egrep -i -is -f $PROCMAILDIR/list.white
> white
>
> But (I guess) I want something like this :
>
> :0:
> * ? formail -x "From:" -x "From" -x "Sender:" | $CLEAN_FROM_SCRIPT \
> | egrep -i -is -f ~/.aliases
> white
>
> and I envisaged $CLEAN_FROM_SCRIPT being a sed or awk commandline, but
> could be bash script. I need help with that part of it.
> Any suggestions?
> Thanks.
1. How about
[a-z0-9_.-]+@[a-z0-9_.-]+
as email address pattern?
2. You should search ~/.aliases for a particular email address, not the
other way around. That is, do
egrep 'user@domain.com' ~/.aliases
and not
echo 'user@domain.com' | egrep -f ~/.aliases
3. As last resort, you can parse ~/.aliases runtime into
ALIASES = '(user1@domain1.com|user2@domain2.com|...)'
to be used as condition in a form of
* ^From:.*(...|...|...)
--
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
No, I will not fix your computer! I'll reformat your harddisk, though.
| |
| Allodoxaphobia 2004-06-06, 11:50 pm |
| On Sun, 06 Jun 2004 21:26:21 +1000, Troy Piggins hath writ:
> Not quite sure where to post this ...
>
> I am trying to write a command line for procmail that takes the From:
> header field from an email and cleans it up into just the email address.
> Reason being that depending on the sender, the From: field could be in
> any of the following formats :
>
> First Last <user@domain.com>
> "First Last" <user@domain.com>
> user@domain.com
> first.last@domain.com.au
>
> and there could be others.
> I want to use just the email address (eg user@domain.com or possibly
> first.last@domain.com.au) to grep my ~/.aliases file to check if I am
> getting mail from someone I trust (whitelist).
I've just now been hacking in this area...
Try:
formail -rtzxTo:
as found in:
http://www.xray.mpe.mpg.de/mailing-...5/msg00315.html
HTH,
Jonesy
--
| Marvin L Jones | jonz | W3DHJ | OS/2
| Gunnison, Colorado | @ | Jonesy | linux __
| 7,703' -- 2,345m | frontier.net | DM68mn SK
| |
| Alan Connor 2004-06-06, 11:50 pm |
| On Sun, 06 Jun 2004 21:26:21 +1000, Troy Piggins <troy@piggo.com> wrote:
>
>
> Not quite sure where to post this ...
>
> I am trying to write a command line for procmail that takes the From:
> header field from an email and cleans it up into just the email address.
> Reason being that depending on the sender, the From: field could be in
> any of the following formats :
>
> First Last <user@domain.com>
> "First Last" <user@domain.com>
> user@domain.com
> first.last@domain.com.au
>
> and there could be others.
> I want to use just the email address (eg user@domain.com or possibly
> first.last@domain.com.au) to grep my ~/.aliases file to check if I am
> getting mail from someone I trust (whitelist).
> My .aliases file is in mutt's format :
>
> alias nickname1 First1 Last1 <user1@domain1.com>
> alias nickname2 First2 Last2 <user2@domain2.com>
> ...
>
> At present I use the following rule, with list.white being just a
> manually edited list of emails only :
>
>:0:
> * ? formail -x "From:" -x "From" -x "Sender:" \
> | egrep -i -is -f $PROCMAILDIR/list.white
> white
>
> But (I guess) I want something like this :
>
>:0:
> * ? formail -x "From:" -x "From" -x "Sender:" | $CLEAN_FROM_SCRIPT \
> | egrep -i -is -f ~/.aliases
> white
>
> and I envisaged $CLEAN_FROM_SCRIPT being a sed or awk commandline, but
> could be bash script. I need help with that part of it.
> Any suggestions?
> Thanks.
Hi Troy,
The nutters are out in force today, aren't they :-)
(I use passlist/blocklist instead of whitelist/blacklist, because the
latter sound racist to me.)
Couple of thoughts:
An effective passlist contains a lot more than just one's trusted
friends. It should include *anyone* you've sent a mail to, and
you aren't going to create an alias for most of those.
They can also be more complex than simply a return address. Sometimes,
you will not know what address the mail will be returned from. Mail
to any large organization can be like that, where it will be handed
from department to department and may end up being returned from the
home of an employee there...
In that case, it is useful to passlist the Subject: line, and have
something like this at the top of the mail:
|The Subject of this mail is a password and needs to be included in
|any reply, unchanged except for Re: (one or more) and whitespace(s).
|Thank you.
For mailing lists, it is often the Return-Path: you passlist, or a
special header from the listserver.
That being said....
(I'm assuming here that you call fetchmail which then calls
procmail...)
It's a lot easier to parse your_alias_file than the incoming headers,
so do that before fetchmail does its thing, with a script:
#!/bin/bash
# /usr/local/bin/alparse
sed -e 's/^\(.*<\)\(.*\)\(>.*\)/\2|\\/' \
-e '$s/|\\//' -e 's/\./\\./g' \
/home/you/your_alias_file > /home/you/newalias
To make it all automatic, alias (shell alias) fetchmail like so:
alias fetchmail='alparse && fetchmail'
So whenever you call fetchmail, the first thing that happens is
that your_alias_file is converted to something procmail can deal
with, and you can add and subract from it without worrying about
having to do anything else. The old newalias will be overwritten
each time you retrieve your mail.
This goes at the top of your .procmailrc:
ALIAS=`cat /home/you/newalias`
then
:0:
* $ ^(From|From:|Sender:|Reply-To:|Return-Path .*${ALIAS}
pass
AC
--
Pass-List -----> Block-List ----> Challenge-Response
The key to taking control of your mailbox. Design Parameters:
http://tinyurl.com/2t5kp || http://tinyurl.com/3c3ag
Challenge-Response links -- http://tinyurl.com/yrfjb
| |
| Troy Piggins 2004-06-07, 2:50 am |
| Allodoxaphobia wrote:
> I've just now been hacking in this area...
>
> Try:
> formail -rtzxTo:
>
> as found in:
>
> http://www.xray.mpe.mpg.de/mailing-...5/msg00315.html
>
> HTH,
> Jonesy
Thanks, Jonesy. That formail line did not do what I wanted, but I
followed that link you gave. The archived post from Nancy McGough *did*
contain a sed script along the lines of what I want. She was using it
to create a whitelist from her addressbook :
cat $HOME/Msgs/AddressBook* \
|fgrep "@" \
|sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
-e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
|sort -fu \
> $HOME/Procmail/whitelist.tmp
I thought I could add it into where I had mentioned $CLEAN_FROM_SCRIPT
like so :
:0:
* ? formail -x "From:" -x "From" -x "Sender:" \
|sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
-e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
| egrep -i -is -f $HOME/.aliases
white.test
Unfortunately this is not working (the mail falls through to one of my
other recipes). The log gives this error :
procmail: Executing " formail -x "From:" -x "From" -x "Sender:" | sed -e
"s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
$HOME/.aliases"
grep: Trailing backslash
procmail: Non-zero exitcode (2) from " formail -x "From:" -x "From" -x
"Sender:" | sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
$HOME/.aliases"
procmail: No match on " formail -x "From:" -x "From" -x "Sender:" | sed
-e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
$HOME/.aliases"
When I run this on the commandline I get the clean address I want :
[troy@linus:~]$ echo "\"Troy Piggins\" <troy@piggo.com>" |sed -e
"s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"
the output is :
troy@piggo.com
which is exactly the result I want.
I reckon the grep trailing backslash is playing up, but all the sed
scripts are confusing me with all the //\\/\/\/ etc.
I am missing something, but can't see it.
--
T R O Y P I G G I N S
e : troy@piggo.com
| |
| Troy Piggins 2004-06-07, 2:50 am |
| Alan Connor wrote:
> Couple of thoughts:
>
> An effective passlist contains a lot more than just one's trusted
> friends. It should include *anyone* you've sent a mail to, and
> you aren't going to create an alias for most of those.
True - I am just using this as a starting point, then once I get that
working I can build from there.
> (I'm assuming here that you call fetchmail which then calls
> procmail...)
yep.
> It's a lot easier to parse your_alias_file than the incoming headers,
> so do that before fetchmail does its thing, with a script:
>
> #!/bin/bash
> # /usr/local/bin/alparse
>
> sed -e 's/^\(.*<\)\(.*\)\(>.*\)/\2|\\/' \
> -e '$s/|\\//' -e 's/\./\\./g' \
> /home/you/your_alias_file > /home/you/newalias
>
> To make it all automatic, alias (shell alias) fetchmail like so:
>
> alias fetchmail='alparse && fetchmail'
>
> So whenever you call fetchmail, the first thing that happens is
> that your_alias_file is converted to something procmail can deal
> with, and you can add and subract from it without worrying about
> having to do anything else. The old newalias will be overwritten
> each time you retrieve your mail.
Fair enough. I was trying to avoid creating a new temp file every time
I check mail. Would have thought this is unnecessary load with reading,
parsing, writing files every time fetchmail called.
That is why I was trying to use sed inline in the recipe.
See my response to Allodox's post for my attempt at solution.
> This goes at the top of your .procmailrc:
>
> ALIAS=`cat /home/you/newalias`
>
> then
>
> :0:
> * $ ^(From|From:|Sender:|Reply-To:|Return-Path .*${ALIAS}
> pass
>
> AC
--
T R O Y P I G G I N S
e : troy@piggo.com
| |
| Alan Connor 2004-06-07, 11:53 pm |
| On Mon, 07 Jun 2004 06:05:58 GMT, Troy Piggins <troy@piggo.com> wrote:
>
>
> Allodoxaphobia wrote:
>
>
> Thanks, Jonesy. That formail line did not do what I wanted, but I
> followed that link you gave. The archived post from Nancy McGough *did*
> contain a sed script along the lines of what I want. She was using it
> to create a whitelist from her addressbook :
>
> cat $HOME/Msgs/AddressBook* \
> |fgrep "@" \
> |sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
> -e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
> |sort -fu \
>
> I thought I could add it into where I had mentioned $CLEAN_FROM_SCRIPT
> like so :
>
>:0:
> * ? formail -x "From:" -x "From" -x "Sender:" \
> |sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" \
> -e "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" \
> | egrep -i -is -f $HOME/.aliases
> white.test
>
> Unfortunately this is not working (the mail falls through to one of my
> other recipes). The log gives this error :
>
> procmail: Executing " formail -x "From:" -x "From" -x "Sender:" | sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
> $HOME/.aliases"
> grep: Trailing backslash
> procmail: Non-zero exitcode (2) from " formail -x "From:" -x "From" -x
> "Sender:" | sed -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
> $HOME/.aliases"
> procmail: No match on " formail -x "From:" -x "From" -x "Sender:" | sed
> -e "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/" | egrep -i -is -f
> $HOME/.aliases"
>
> When I run this on the commandline I get the clean address I want :
>
> [troy@linus:~]$ echo "\"Troy Piggins\" <troy@piggo.com>" |sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"
>
> the output is :
>
> troy@piggo.com
>
> which is exactly the result I want.
> I reckon the grep trailing backslash is playing up, but all the sed
> scripts are confusing me with all the //\\/\/\/ etc.
> I am missing something, but can't see it.
I have run into the same sort of problem, many times, which is why I
use the solution I presented in my post. Procmail can be very finnicky
and has trouble with complex scripts in the rc file itself.
It's usually better to put the script elsewhere and call it or pipe
through it from the procmailrc.
A log entry that reads: "program failure of script3" is a lot easier
to interpret than that mess above, and you know exactly where the
problem is.
On another level, you have to allow for a lot of variation in the
contents of those headers, sedscripting for all of them, and it is
much simpler to let procmail egrep them for a string.
Keep us posted. :-)
AC
--
Pass-List -----> Block-List ----> Challenge-Response
The key to taking control of your mailbox. Design Parameters:
http://tinyurl.com/2t5kp || http://tinyurl.com/3c3ag
Challenge-Response links -- http://tinyurl.com/yrfjb
| |
| Troy Piggins 2004-06-07, 11:53 pm |
| Alan Connor wrote:
> Keep us posted. :-)
Well, I think I have done it - this seems to work for a couple of test
emails :
WHITELIST=$HOME/.aliases
CLEAN_FROM=`formail -x "From:" -x "From" -x "Sender:" | sed -e
"s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
"s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"`
:0:
* ? cat $WHITELIST | fgrep -is $CLEAN_FROM
white
I am happy.
--
T R O Y P I G G I N S
e : troy@piggo.com
| |
| Alan Connor 2004-06-07, 11:53 pm |
| On Mon, 07 Jun 2004 09:20:31 GMT, Troy Piggins <troy@piggo.com> wrote:
>
>
> Alan Connor wrote:
>
>
> Well, I think I have done it - this seems to work for a couple of test
> emails :
>
> WHITELIST=$HOME/.aliases
> CLEAN_FROM=`formail -x "From:" -x "From" -x "Sender:" | sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"`
>
>:0:
> * ? cat $WHITELIST | fgrep -is $CLEAN_FROM
> white
>
> I am happy.
Great. I'll tuck this in my procmail docs.
AC
| |
| William Park 2004-06-07, 11:53 pm |
| In <comp.editors> Troy Piggins <troy@piggo.com> wrote:
> Alan Connor wrote:
>
>
> Well, I think I have done it - this seems to work for a couple of test
> emails :
>
> WHITELIST=$HOME/.aliases
> CLEAN_FROM=`formail -x "From:" -x "From" -x "Sender:" | sed -e
> "s/^.*[^A-Za-z0-9_.+-]\([A-Za-z0-9_.+-]*@\)/\1/" -e
> "s/\(@[A-Za-z0-9_.+-]*\)[^A-Za-z0-9_.+-].*$/\1/"`
>
> :0:
> * ? cat $WHITELIST | fgrep -is $CLEAN_FROM
> white
>
> I am happy.
Try this one...
:0
* ^From:.*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
* ? grep -i "$MATCH" ~/.aliases
white
--
William Park, Open Geometry Consulting, <opengeometry@yahoo.ca>
No, I will not fix your computer! I'll reformat your harddisk, though.
| |
| Alan Connor 2004-06-07, 11:53 pm |
| On 7 Jun 2004 15:00:52 GMT, William Park <opengeometry@yahoo.ca> wrote:
>
>
> In <comp.editors> Troy Piggins <troy@piggo.com> wrote:
>
> Try this one...
>
> :0
> * ^From:.*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
> * ? grep -i "$MATCH" ~/.aliases
> white
>
Nice.
AC
| |
| Troy Piggins 2004-06-07, 11:53 pm |
| William Park wrote:
> Try this one...
>
> :0
> * ^From:.*\/[A-Za-z0-9_.+-]+@[A-Za-z0-9_.+-]+
> * ? grep -i "$MATCH" ~/.aliases
> white
Aww, why do I have to always look for the most complicated solution :-(
Me thinks I like yours much better.
I thank you.
--
T R O Y P I G G I N S
e : troy@piggo.com
| |
|
|
|
|
|