Pine does not decode encoded Subjects that are longer than 75 bytes • WebServerTalk.com

Pine refuses to /decode/ encoded Subjects that are longer than 75 bytes.
This problem has been addressed earlier, for example:
http://groups.google.com/groups?th=b8604ae4b9eb5c5b

Allegedly, RFCs prohibit the decoding of long encoded words.
But I do not understand why. Could someone please explain it to me?
Pine seems to be the *only* mail or news program that refuses
to decode.

All others are more liberal in what they accept.
Should not Pine also be liberal in what it accepts?

Perhaps the (anglophone) developers don’t see a need to show
Subjects in a user-friendly way. But in French- or German-speaking
newsgroups, for example, one often finds long encoded words.
It it really annoying that Pine does not show them in a readable
way to the user.

Pine is user-hostile in this respect!

_______________________________________________________________________
> Pine refuses to /decode/ encoded Subjects that are longer than 75 bytes.
> Allegedly, RFCs prohibit the decoding of long encoded words.

Not allegedly. Explicitly. From RFC 2047 (the specification for
encoded-words), page 4:

An ‘encoded-word’ may not be more than 75 characters long, including
‘charset’, ‘encoding’, ‘encoded-text’, and delimiters. If it is
desirable to encode more text than will fit in an ‘encoded-word’ of
75 characters, multiple ‘encoded-word’s (separated by CRLF SPACE) may
be used.

While there is no limit to the length of a multiple-line header
field, each line of a header field that contains one or more
‘encoded-word’s is limited to 76 characters.

> But I do not understand why. Could someone please explain it to me?

Some MTAs and mailing list processors insert line breaks in lines that are
longer than that, which renders the encoded word undecodable.
Consequently, instead of using a single encoded-word for a long subject,
software is supposed to use multiple words.

> Pine seems to be the *only* mail or news program that refuses
> to decode. All others are more liberal in what they accept.
> Should not Pine also be liberal in what it accepts?

This statement is based upon a terrible misunderstand of Postel’s
robustness principle. I knew Jon Postel. He was quite unhappy with
how his robustness principle was abused to cover up non-compliant
behavior, and to criticize compliant software.

Jon’s principle could perhaps be more accurately stated as “in general,
only a subset of a protocol is actually used in real life. So, you should
be conservative and only generate that subset. However, you should also
be liberal and accept everything that the protocol permits, even if it
appears that nobody will ever use it.”

> Perhaps the (anglophone) developers don’t see a need to show
> Subjects in a user-friendly way. But in French- or German-speaking
> newsgroups, for example, one often finds long encoded words.

The problem with pulling the “differences” card (be it nation, race,
gender, sexual habits, religion, whatever) to win a concession is that it
implicitly states one’s own inferiority: “we’re not as as good as you,
therefore you must give us such-and-such.”

I reject such arguments utterly; anyone capable of implementing newsgroup
software is capable of following the specifications properly.

In no way does this say “you can’t have long subjects in European
languages.” All it does is require that long subjects be in the correct
form for long subjects, which all MIME-capable software throughout the
world will recognize and handle properly.

Here is the proper thing to do. Suppose you are discussing Goethe, and
the subject of your message is a quote. Instead of generating:
?=ISO-8851-1?q? Das_sc=E4dlichste_Vorurteil_ist=2C_da=DF
_irgend_eine_Art_Naturuntersuchung_mit_d
em_Bann_belegt_werden_k=F6nne?=
your software should generate this:
?=ISO-8851-1?q? Das_sc=E4dlichste_Vorurteil_ist=2C_da=DF
_irgend_eine_?=
?=ISO-8851-1?q? Art_Naturuntersuchung_mit_dem_Bann_beleg
t_werden_k=F6nne?=

[I think that I got this example right. I generated it manually, and the
quote is from memory. So please don’t flame me if there’s a mistake;
it’s the principle in the example that counts.]

Also, software which fails to enforce the specifications is also
vulnerable to all types of security attacks.

It seems that there *is* concern about this in Europe.

Not long ago, a European government released a test suite which exploited
various security holes caused by mishandling of specification violations.
We were asked by that government to put Pine and imapd through that test
suite. Pine and imapd passed with flying colors.

Sadly, nothing else did. This included some rather expensive commercial
products. They all passed and/or installed the hidden virus (fortunately
killed) in the suite. Most caught some of the attempts, but Pine was the
only one that caught all.

It may not be “user-friendly” to show a phish scam as a bunch of
meaningless HTML instead of rendering it (because it didn’t follow
specifications for how HTML mail is sent), but I prefer that behavior.
That way, I immediately know that it is a phish (almost all phishes
violate the specifications) and don’t have to waste my time looking at it
more closely to see if it is really something from my bank.

_______________________________________________________________________

>
> Not allegedly. Explicitly. From RFC 2047 (the specification for
> encoded-words), page 4:
> [……]

Sorry, no! This is all about *en*coding. But I’m asking about *de*coding,
that is, showing the Subject to the reader. Where does RFC 2047 say
that the Subject of this thread _may not_ be shown as

Pine does not decode encoded Subjects that are longer than 75 bytes

to the reader? Please look at this thread in other newsreaders. I bet
they all show the Subject in this way. Are the other newsreaders wrong?

> Some MTAs and mailing list processors insert line breaks in lines that are
> longer than that, which renders the encoded word undecodable.
> Consequently, instead of using a single encoded-word for a long subject,
> software is supposed to use multiple words.

I agree completely with you. But this was not my point.

> Here is the proper thing to do. Suppose you are discussing Goethe, and
> the subject of your message is a quote. Instead of generating:
> ?=ISO-8851-1?q? Das_sc=E4dlichste_Vorurteil_ist=2C_da=DF
_irgend_eine_Art_Naturuntersuchung_mit_d
em_Bann_belegt_werden_k=F6nne?=
> your software should generate this:
> ?=ISO-8851-1?q? Das_sc=E4dlichste_Vorurteil_ist=2C_da=DF
_irgend_eine_?=
> ?=ISO-8851-1?q? Art_Naturuntersuchung_mit_dem_Bann_beleg
t_werden_k=F6nne?=

This is all fine and there is no point to argue because I agree with you
completely. However, I did not write about *generating* or sending any
Subject but only about *displaying* an *incoming* long Subject.

(Actually, _you_ argue against Pine’s behaviour. By your argument, Pine
should break the Subject of this thread in the reply. But it does not
do this currently!)

I ask again: Which RFC (and why) prohibits to *show* the Subject of
this thread as

Pine does not decode encoded Subjects that are longer than 75 bytes

to the reader? I do not address the question what you should happen
with a long encoded word in *replying*. Currently, Pine leaves the
Subject as it is – contrary to your arguments. But this is not my point.

_______________________________________________________________________
>
>
> Sorry, no! This is all about *en*coding. But I’m asking about *de*coding,

See RFC2047 section 6.1 item (1):

‘encoded-word’s are to be recognized as follows:

(1) Any message or body part header field defined as ‘*text’, or any
user-defined header field, should be parsed as follows: Beginning
at the start of the field-body and immediately following each
occurrence of ‘linear-white-space’, each sequence of up to 75
^^^^^^^^^^^^^^^^^^^^^^^^^
printable characters (not containing any ‘linear-white-space’)
should be examined to see if it is an ‘encoded-word’ according to
the syntax rules in section 2. Any other sequence of printable
^^^^^^^^^^^^^^^^^^
characters should be treated as ordinary ASCII text.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Seems to me that PINE is doing just what it says on the tin (no
pun intended).

> Please look at this thread in other newsreaders. I bet
> they all show the Subject in this way. Are the other newsreaders wrong?

They would presumably need to explain themselves as to why they are
violating this SHOULD condition of the RFC.

_______________________________________________________________________
> See RFC2047 section 6.1 item (1):

Thanks for pointing that out. It’s amazing that we need explicit text for
the interpretation procedure; interpretation is (after all) nothing more
than the inverse of the generation procedure.

But I guess that the current argument, based upon an abuse of Postel’s
robustness principles, is what makes it necessary.

Then we end up with “nobody reads RFCs because they are too bloated.”

Can’t win. Can’t break even. Can’t quit the game. 🙂

_______________________________________________________________________

> Thanks for pointing that out. It’s amazing that we need explicit
> text for the interpretation procedure; interpretation is (after all)
> nothing more than the inverse of the generation procedure.

With respect: “not always”. There are sometimes permissive areas
between generation and acceptance. But in this case it seems to be
stated rather clearly.

For what it’s worth, our departmental mailer runs the (excellent) exim
MTA software, which has quite a range of optional facilities to
validate various features of the SMTP protocol exchange; we have
chosen generally to reject mails whose headers fail syntax validation.

Almost all of our logged rejections can be rated as deliberate abuse;
I’m sorry to say that those few whose intentions are otherwise bona
fide, seem to have a habit of hassling us to accept their defective
mail (some by a proper route to our postmaster/abuse address, others
by libelling us on a public forum), and do not seem inclined to
understand that the fault lies more on their side than on ours.

The usual refrain is “*nobody* else refuses our mail, so there can’t
be anything wrong with it”. Sigh. One time I did helpfully cc: my
reply to a colleague at another site which I knew would reject their
subsequent reply for the same reason, and then waited for the bang.

The complainant was not well-pleased that he’d now found a second site
that rejected his illegal header, thus giving the lie to the claim
that it was only us. But he was still not inclined to correct his own
software. Apparently he felt that the fact that he was paying a
considerable sum of money for his mail software implied some kind of
compulsion on us to accept anything that it cared to extrude.

> Can’t win. Can’t break even. Can’t quit the game. 🙂

2005-09-22, 9:03 pm
On Thu, 22 Sep 2005, Alan J. Flavell wrote:
> I’m sorry to say that those few whose intentions are otherwise bona
> fide, seem to have a habit of hassling us to accept their defective
> mail (some by a proper route to our postmaster/abuse address, others
> by libelling us on a public forum), and do not seem inclined to
> understand that the fault lies more on their side than on ours.

If it makes you feel any better, the same thing has been going on in the
ARPAnet/Internet world for at least three decades; perhaps longer, but I
don’t have first-hand knowledge.

> The usual refrain is “*nobody* else refuses our mail, so there can’t
> be anything wrong with it”.

I would go beyond the word “usual” by saying “invariable”. Apparently,
there is a flaw in the mental wiring of our human species that allows peer
pressure to make us do stupid things. “Everybody else is doing it, so you
should too.”