Email attachments are encoded into text format before sending to ensure
that control characters aren't sent over the Internet.
The text of your email is stored as a series of readable alphanumeric characters.
However, graphics, spreadsheets, video files, and even word processing documents
can contain characters that may be stored in any of the 256 different combinations
of 1's and 0's that make up an 8-bit byte. For example, three different byte
combinations are shown below:
01000010
01101101
01011011
Most byte combinations are readable, such as the ones above, which represent
the characters "B", "m", and "[" respectively. Something less than 100 of
the 256 different possible byte combinations represent the standard alphanumeric
characters including capital letters, lower case letters, numbers, punctuation
marks, and other characters found on most computer keyboards -- a list of
codes for the standard ANSI characters can be found here.
Extending beyond these old fashioned codes to keep information technology
relevant in the modern world, the Unicode
Consortium provides standards management of
a consistent assignment of codes to a wide range of different characters
across platforms, programs and languages.
However, many of the byte combinations that don't represent readable characters
are also used as instructions. If bytes containing these instruction codes
were transmitted over the Internet, at a minimum the message would be broken
into separate pieces, and at worse the instruction bytes would unintentionally
tell the routers, switches, and other components that handled your email
to take all sorts of unpredictable actions. In practice, the software systems
-- email client, operating system, networking software -- between your email
program and the Internet might catch any strange byte combinations before
they hit the wider network, but since this is not a standard case they can
produce unpredictable results. For example, this writer once sent an attachment
with an experimental email system that didn't encode attachments properly,
resulting in the attachment being received as a zero-length file, truncated
of all content by an intervening software layer who presumably stopped reading
as soon as it saw an unconverted control character.
To protect against this problem, email programs routinely encode attached
files before they are mailed with a program that filters out any non-readable
bytes in a predictably reversible way. When the recipient's email program
receives the attachment and it is downloaded onto their machine, their email
program decodes the attachment according to a standard procedure to reconstruct
the original
file.
Each encoded file includes an instruction that tells the email recipient
what type of encoding program was used. There are a number of more or less
standard encoding methods, including MIME, uuencode, BinHex, and AppleDouble
(Mac version of MIME). Most email programs can decode most of the common
standards. Two of the most common standards are described below:
- MIME. The modern MIME encoding
standard was first defined in paragraph 4.3 of RFC
989, updated by paragraph 4.3.2.4 of RFC
1421, and has become the most common standard used for email encoding.
MIME encodes a file into the following 64 alphanumeric characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789+/
- Uuencode. Uuencode was one of the earliest encoding standards,
first developed on the Unix BSD operating
system. For many years there was no other standards document defining uuencode,
which led to incompatible implementations until later versions were generally
built to be compliant with the POSIX standard P1003.2b/D11, later IEEE
Std 1003.1-2001. These later versions incorporated the MIME standard
as an option. Most of the earlier versions encoded a file into the following
64 text characters:
`!"#$%&'()*+,-./0122456789:;<=>?@
ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
You are sometimes given a choice in your email application settings for
which encoding method your program should use. The best choice is usually
MIME, which almost all email programs support. However, if somebody can't
read an attachment that you send them, try setting the encoding method to
AppleDouble (same as MIME on a Mac), BinHex, or Uuencode in that order. Remember
to
change your settings back to MIME for sending to everyone else.
Resources. The following RFC provides a good description of some
current encoding standards, including Base64 which is a common name for MIME.
- RFC
3548;
S. Josefsson, Ed.; The Base16, Base32, and Base64 Data Encodings; July
2003.