-=:[ The IBM PC Character Set Confusion Clarified ]:=-

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                            [Back] [Bottom] [Index]
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

              -=> The IBM PC Character Set Confusion Clarified <=-

          By Michael Walden - Created: 2025-04-26 - Updated: 2025-04-26

Introduction
------------

Since the early eighties, I have been using computer equipment from the PC
compatible market.  I have been exposed to a vast array of the following
items: IBM computers, IBM clones, MS-DOS compatibles, video display adapter
cards, printers, modems, serial terminals, and more.

From my first experience, I was fascinated by the diverse set of characters
included in character tables in the hardware technical reference manuals,
software applications, and books that I saw.  I always wondered what the
names and purposes of the unusual characters were.

Over time it became apparent to me that there were subtle differences in the
character set from one implementation to another.  It should be noted that
all of the character sets were intended to be IBM PC compatible (i.e.
identical), yet they would go astray.

Eventually an IBM PC Technical Reference Manual made its way to me.  I had
the expectation that I would finally see a definitive list of the IBM PC
character set specifications.  Unfortunately, to my dismay, no such
information was provided.  The only substantial info relating to the character
set was a two page table that depicted the glyphs.  Some of the glyphs in the
table did not correctly reflect those actually produced on screen.  Even IBM
did not get the character set correctly depicted in their own manual.  I guess
that the typesetter was struggling to locate matching type for all of those
strange characters.  Apparently IBM was in such a rush to get the PC to
market that they neglected to include and/or define the detailed description
of the PC's character set.

It should come as no surprise that there are inconsistencies in other
implementations of the IBM PC's character set, when you consider that there
was no official description in the IBM PC Technical Reference manual.  This
allows for confusion to occur when implementing the IBM PC character set in
a device and when using the characters in the character set.

The Current Problem
-------------------

Why, you might ask, am I describing this 40+ year old matter?  It may come
as a surprise to find out that this problem is still being perpetuated in
hardware, such as video display adapters, and software, such as MS-Windows.

The Unicode Consortium has produced a standard for a universal character set
that is intended to be a superset of prior character sets.  They also provide
mapping tables which define the relationship of the characters in a character
set, such as the one in the IBM PC (Code Page 437), to the *same* characters
within the Unicode universal character set.  Here again, it is my opinion that
the IBM PC's character set is incorrectly defined by the present Unicode
mapping tables. [1][2][3][4][5]  (Note: As you will see, Unicode universal
character set code points are used here since they provide a good backbone to
work from and the Unicode universal character set appears to be the character
set of choice for the foreseeable future.)

The primary places where the IBM PC character set is seen in computers today
is in the text displayed when booting a PC, legacy DOS applications (in DOS,
FreeDOS, and in a DOS box under MS-Windows (COMMAND.EXE)), the command prompt
within MS-Windows (CMD.EXE), x86 Linux console applications, Point Of Sale
(POS) computer applications, terminal emulation software, telecommunication
software, IBM PC emulators, IRC chat, inside .ZIP archive files (*.nfo / *.diz
/ *.txt / read.me / read.1st) and in the ANSI art scene.

While the areas affected may be limited, it does not make sense to knowingly
perpetuate a mistake when you can just as easily correct it.

The Solution
------------

It is my intention to produce the most accurate definition possible of the
IBM PC character set.  I will do this, primarily, by looking at the only
sound evidence, the character set bitmaps from the IBM PC MDA video card.

Additionally, I have been attempting to contact others that were involved
with the creation of the PC at IBM.  I did manage to contact Dr. David J.
Bradley and asked him about the characters in the IBM PC character set. [6]
He was mainly able to provide me with character range purposes but not all of
the exact names of the characters in the character set.  I have attempted to
track down others involved in the making of the IBM PC without success.

I will also point out that others have stated that there is no way possible
to state what some of the characters are since there are multiple uses for
them.  While it may be true that multiple uses are possible, it is my
opinion that there are specific unique characteristics of the characters
within the IBM PC that indicate which one of the multiple uses was the prime
one intended.  Those prime intentions are the ones that should be statically
defined as the characters that are used in the IBM PC character set.  As an
example, the GREEK SMALL LETTER BETA and the LATIN SMALL LETTER SHARP S may
appear *substantially* identical, but the actual character used when
creating the IBM PC character set was one or the other, not both.  If it
were both, there would have been some hybrid character designed that
exhibited attributes of both characters equally (which is actually impossible
to do, in my opinion).

Additionally, some characters may have ambiguous interpretations when viewed
independently, but when viewed in the context of the character set range in
which they occur, I find the characters to be unambiguous.

Below is a list of the 20 characters which are, in my opinion, defined
incorrectly in the Unicode mappings.  For each incorrectly mapped IBM PC
character, the first line is my correct Unicode mapping and the following
line(s) is/are the incorrect mapping(s).  Then I give a description of the
reason for my disagreement with the incorrect mapping.  This will provide
clarification of the correct characters in the IBM PC character set (Code
page 437).  Note that all numbers in the following section are hexadecimal.

Note: On the right of my assessments below, the "WIDELY SUPPORTED" statements
relate to the character's availability in fonts at the time I originally did
my work.  Currently these statements are mostly obsolete since all of my
CORRECT characters (except 1D161 - 𝅘𝅥𝅯 MUSICAL SYMBOL SIXTEENTH NOTE) are
currently supported in software and most fonts.  One lingering issue is that
some of the NOT WIDELY SUPPORTED characters are rendered in a different size
from most of the other characters in some fonts.

=============================================================================

Clarification -- Format:

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	00

        Unicode
IBM PC  Hex Code   Unicode
Hex     Point    - Char     Unicode Character Name      My Assessment
------- -------- - ------------------------------------ CORRECT
------- -------- - ------------------------------------ INCORRECT

Description
-----------------------------------------------------------------------------
...

=============================================================================

Clarification 00

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	00

00      2007 - FIGURE SPACE                             CORRECT
00      0000 - <control> = NULL                         INCORRECT

Note: The two characters pictured above 2007 and 0000 are shown with dashed
line square surrounds indicating that there is no glyph for each character.

There is an issue with IBM PC character 00.  The dilemma is what I should map
it to.  The IBM PC has three blank characters 00 Null, 20 Space, and FF No-
Break Space.  IBM PC 20 maps to Unicode 0020 SPACE, and IBM PC FF maps to
Unicode 00A0 NO-BREAK SPACE.  Mapping IBM PC 00 to Unicode 0000 NULL is not
correct to do since Unicode 0000 NULL is a control code and has no glyph
assigned to it.  An example of this is that you can't display a Unicode 0000
NULL character on an HTML web page since it has no glyph.  When attempting to
print a Unicode NULL control character on a web page (by using &#0;) it is
replaced by FFFD � REPLACEMENT CHARACTER, by the web browser, which is not a
blank character.  Yet, the IBM PC does have a glyph assigned to 00 Null that
just happens to be a blank space.  That can be demonstrated, when placing a
00 Null value into IBM PC video ram, by you seeing that a blank glyph will
appear in the character cell filled with the Null.  To resolve this issue I
would need to choose another blank Unicode character to map to the IBM PC 00
Null.

You might say to just map one of the other two blank characters to 00 Null,
such as 00A0 NO-BREAK SPACE to produce a blank for 00 Null.  While that would
result in a visually correct conversion of an IBM PC text mode screen capture,
it would result in a different issue.  A Round-trip format conversion from IBM
PC to Unicode and back to IBM PC would result in data loss of the IBM PC
character 00 or FF since they both map to the same Unicode 00A0 NO-BREAK SPACE.

So the best solution is to map IBM PC 00 Null to another Unicode blank
character.

After searching the Unicode character database I have come up with two possible
candidates for blank characters:

  2007 ( ) FIGURE SPACE [&numsp;]
   • space equal to tabular width of a font
   • this is equivalent to the digit width of fonts with fixed-width digits

  202F ( ) NARROW NO-BREAK SPACE (NNBSP) [&#8239;] or [&#x202F;]
   • a narrow form of a no-break space, typically the 
     width of a thin space or a mid space

Of those two I chose to use 2007 FIGURE SPACE since it should be the correct
character width to use when working with fixed-width characters from an IBM PC
text mode screen capture.  I am not knowledgeable in the use of the Unicode
2007 FIGURE SPACE character and whether or not there might be an issue using it
as a blank space when converting IBM PC information into Unicode.  As far as I
can tell there should be no problem.  If a problem is found I will revise the
mapping of IBM PC character 00 at that time.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 0D

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	0D

0D 1D161 - 𝅘𝅥𝅯 MUSICAL SYMBOL SIXTEENTH NOTE CORRECT - BUT TOO NEW FOR SUPPORT IN APPLICATIONS AND FONTS
0D 266A - ♪ EIGHTH NOTE INCORRECT - BUT WIDELY SUPPORTED

IBM PC Glyph 0D clearly has two horizontal protrusions extending to the right
of the vertical. This is indicative of a MUSICAL SYMBOL SIXTEENTH NOTE, not
an EIGHTH NOTE, which has only one horizontal protrusion. Technically, the
two protrusions should be flags, rather than beams, when there is only one
note. IBM PC Glyph 0D resembles 266C - ♬ BEAMED SIXTEENTH NOTES with the
lower right portion removed. I believe that the depiction of IBM PC Glyph
0D, the MUSICAL SYMBOL SIXTEENTH NOTE, was altered slightly in response to
the limited resolution available in a character cell. Remember that all of
the characters were depicted in 9x14 (MDA) and 8x8 (CGA) character cells.
The limited resolution of the 8x8 (CGA) implementation probably influenced
the 9x14 (MDA) implementation since the designers probably wanted the
character to appear substantially the same in both MDA and CGA displays. It
would be difficult to depict the curved flag shapes, so the beams were used
instead. Unfortunately, the CORRECT mapping MUSICAL SYMBOL SIXTEENTH NOTE
is not supported in current applications and fonts since the character is too
new. It is not currently possible to correct this INCORRECT mapping. For
the present time the use of the EIGHTH NOTE, which is a widely supported
character, is reasonable but still INCORRECT.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 0E

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	0E

0E      266C - ♬ BEAMED SIXTEENTH NOTES                 CORRECT
0E      266B - ♫ BEAMED EIGHTH NOTES                    INCORRECT

IBM PC Glyph 0E clearly has two horizontal beams extending between the two
vertical lines.  This is indicative of BEAMED SIXTEENTH NOTES, not BEAMED
EIGHTH NOTES, which has one horizontal beam extending between the two
vertical lines.  So, BEAMED SIXTEENTH NOTES is the correct mapping.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 10

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	10

10      25B6 - ▶ BLACK RIGHT-POINTING TRIANGLE          CORRECT - BUT NOT WIDELY SUPPORTED
10      25BA - ► BLACK RIGHT-POINTING POINTER           INCORRECT - BUT WIDELY SUPPORTED

IBM PC Glyphs 10 and 11 (see below) are closer to equilateral triangles, which
have three congruent sides (see BLACK RIGHT-POINTING TRIANGLE and BLACK LEFT-
POINTING TRIANGLE), than isosceles triangles which have two congruent sides
(see BLACK RIGHT-POINTING POINTER and BLACK LEFT-POINTING POINTER).  Notice how
the BLACK RIGHT-POINTING POINTER and BLACK LEFT-POINTING POINTER are
squashed in the vertical direction.  Additionally, I believe that 10 and 11 are
related to 1E and 1F which are ▲ BLACK UP-POINTING TRIANGLE and ▼ BLACK DOWN-
POINTING TRIANGLE respectively.  Changing 10 and 11 to my correct mapping
would create a complete set of the four directional triangles, like the IBM
PC appears to have.  Alternatively, you could say that 1E and 1F are the
incorrectly mapped characters, not 10 and 11, but there are no up or down
pointers that could be mapped to 1E and 1F.  So, BLACK RIGHT-POINTING TRIANGLE
and BLACK LEFT-POINTING TRIANGLE are the CORREXT mappings for 10 and 11.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 11

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	11

11      25C0 - ◀ BLACK LEFT-POINTING TRIANGLE           CORRECT - BUT NOT WIDELY SUPPORTED
11      25C4 - ◄ BLACK LEFT-POINTING POINTER            INCORRECT - BUT WIDELY SUPPORTED

See Clarification 10 (IBM PC Glyph 10) above.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 1C

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	1C

1C      2319 - ⌙ TURNED NOT SIGN (line marker)          CORRECT
1C      221F - ∟ RIGHT ANGLE                            INCORRECT
    
IBM PC Glyph 1C has a vertical line that is shorter than the horizontal line.
RIGHT ANGLE has both horizontal and vertical lines of equal length.
Therefore, 1C is not a RIGHT ANGLE symbol.  The RIGHT ANGLE symbol is a math
symbol having no other alias.  All of the characters in the 01-1F range are
unrelated to math, so it makes sense that the character must be something
other than RIGHT ANGLE.  1C resembles the TURNED NOT SIGN's alias
"line marker" which is a non-math symbol.  So, TURNED NOT SIGN is CORRECT.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 27

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	27

27      2019 - ’ RIGHT SINGLE QUOTATION MARK            CORRECT
27      0027 - ' APOSTROPHE                             INCORRECT

IBM PC Glyph 27 has a high comma shape.  So, the RIGHT SINGLE QUOTATION MARK
which also appears as a high comma is the CORRECT mapping with respect to the
visual appearance of the glyph.  The use of the RIGHT SINGLE QUOTATION MARK
character in software is not the same as APOSTROPHE which is an ASCII
character.  If your intention is to replicate the exact appearance then use
RIGHT SINGLE QUOTATION MARK.  If you want to replicate the ASCII use of the
character in software, then use APOSTROPHE.  In this file I am looking at
replicating the appearance of the IBM PC, so I am saying RIGHT SINGLE
QUOTATION MARK is CORRECT.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 7C

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	7C

7C      00A6 - ¦ BROKEN BAR                             CORRECT
7C      007C - | VERTICAL LINE                          INCORRECT

IBM PC Glyph 7C has a notch taken out of the center of a vertical line.  So,
the BROKEN BAR is the CORRECT mapping with respect to the visual appearance of
the glyph.  The use of the BROKEN BAR character in software is not the same as
VERTICAL LINE which is an ASCII character.  If your intention is to replicate
the exact appearance then use BROKEN BAR.  If you want to replicate the ASCII
use of the character in software, then use VERTICAL LINE.  In this file I am
looking at replicating the appearance of the IBM PC, so I am saying BROKEN BAR
is CORRECT.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification 7F

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	7F

7F      2302 - ⌂ HOUSE                                  CORRECT
7F      0394 - Δ GREEK CAPITAL LETTER DELTA             INCORRECT

IBM PC Glyph 7F appears like a squat house character.  So, HOUSE is the closest
Unicode character and the CORRECT mapping to the IBM PC Glyph.  Technically,
that is not the correct mapping relative to the intended meaning, which is
GREEK CAPITAL LETTER DELTA.  Look at the following two links to see where the
ROM BIOS assembly code remark shows DELTA (short for GREEK CAPITAL LETTER
DELTA).

IBM-PC-BIOS/PCBIOS.ASM at main · philspil66/IBM-PC-BIOS · GitHub [7]

IBM PC 5150 Technical Reference 6361453 (### See address FE66 ###) [8]

In this file I am looking at replicating the appearance of the IBM PC, so I am
saying HOUSE is CORRECT.
 
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification E0

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	E0

E0      03B1 - α GREEK SMALL LETTER ALPHA               CORRECT
E0      221D - ∝ PROPORTIONAL TO                        INCORRECT
    
IBM PC Glyph E0 falls in the character range E0-EB which is the Greek Math
characters range.  The PROPORTIONAL TO Unicode character falls in the
Mathematical Operators Range: 2200–22FF at code point 221D.  The correct
mapping is the GREEK SMALL LETTER ALPHA based on the range where it appears
in the "IBM PC Code Page 437 character ranges by D.J.B and M.W." table.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification E1

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	E1

E1      03B2 - β GREEK SMALL LETTER BETA                CORRECT
E1      00DF - ß LATIN SMALL LETTER SHARP S             INCORRECT
    
IBM PC Glyph E1 clearly has it's left hand side descending below the baseline,
just like in the GREEK SMALL LETTER BETA.  E1's bottom horizontal connects to
the vertical, just like in the GREEK SMALL LETTER BETA.  Also note that all of
the characters in the range E0-EB are Greek Math characters.  With that in mind,
LATIN SMALL LETTER SHARP S would not be correct since it is not a Greek Math
character.  So, GREEK SMALL LETTER BETA is the correct mapping.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification E6

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	E6

E6      03BC - μ GREEK SMALL LETTER MU                  CORRECT
E6      00B5 - µ MICRO SIGN                             INCORRECT
    
IBM PC Glyph E6 looks like either GREEK SMALL LETTER MU or MICRO SIGN.  The
concern here is strictly relative to the meaning of the character not the
shape.  The characters in the range E0-EB are Greek Math characters.  With that
in mind, MICRO SIGN would not be correct since it is not a Greek math
character.  So, GREEK SMALL LETTER MU is the correct mapping.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification E7

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	E7

E7      03B3 - γ GREEK SMALL LETTER GAMMA               CORRECT
E7      03C4 - τ GREEK SMALL LETTER TAU                 INCORRECT
    
IBM PC Glyph E7 has a non-straight horizontal piece, unlike the GREEK SMALL
LETTER TAU which is a mostly straight line.  Under close inspection, the left
portion has a hook and the right side leans up to the right, just like the
GREEK SMALL LETTER GAMMA.  Admittedly, E7 is difficult to discern due to it's
poor rendering.  I believe that this is due to the limited character cell
resolution.  So, GREEK SMALL LETTER GAMMA is the correct mapping.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification E9

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	E9

E9      03B8 - θ GREEK SMALL LETTER THETA               CORRECT
E9      0398 - Θ GREEK CAPITAL LETTER THETA             INCORRECT
    
IBM PC Glyph E9 clearly has connections between the left and right ends of the
horizontal line and the sides of the oval.  This is like the GREEK SMALL LETTER
THETA, not GREEK CAPITAL LETTER THETA which has breaks between the ends of the
horizontal line and the sides of the oval. So, GREEK SMALL LETTER THETA is the
correct mapping

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification EA

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	EA

EA      03A9 - Ω GREEK CAPITAL LETTER OMEGA             CORRECT
EA      2126 - Ω OHM SIGN                               INCORRECT

IBM PC Glyph EA, looks like either GREEK CAPITAL LETTER OMEGA or OHM SIGN.  The
concern here is strictly relative to the meaning of the character not the
shape.  The characters in the range E0-EB are Greek Math characters.  With that
in mind, OHM SIGN would not be correct since it is not a Greek math character.
So, GREEK CAPITAL LETTER OMEGA is the correct mapping.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification ED

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mappings
	ED

ED      2205 - ∅ EMPTY SET (null set)                   CORRECT - BUT NOT WIDELY SUPPORTED
ED      03C6 - φ GREEK SMALL LETTER PHI                 INCORRECT - BUT WIDELY SUPPORTED
ED      00F8 - ø LATIN SMALL LETTER O WITH STROKE       INCORRECT - BUT WIDELY SUPPORTED

IBM PC Glyph ED looks like EMPTY SET, GREEK SMALL LETTER PHI, and LATIN SMALL
LETTER O WITH STROKE.  The characters in the range EC-FE are Math symbols.
With that in mind, EMPTY SET would be correct since it is a math symbol.  The
other two characters are not math symbols, they are non-English language
characters.  So, EMPTY SET is the correct mapping.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification EE

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mappings
	EE

EE      2208 - ∈ ELEMENT OF                             CORRECT - BUT NOT WIDELY SUPPORTED
EE      03B5 - ε GREEK SMALL LETTER EPSILON             INCORRECT - BUT WIDELY SUPPORTED
EE      0404 - Є CYRILLIC CAPITAL LETTER UKRAINIAN IE   INCORRECT - BUT WIDELY SUPPORTED

IBM PC Glyph EE looks like ELEMENT OF, GREEK SMALL LETTER EPSILON, and CYRILLIC
CAPITAL LETTER UKRAINIAN IE.  The characters in the range EC-FE are Math
symbols.  With that in mind, ELEMENT OF would be correct since it is a math
symbol.  The other two characters are not math symbols, they are non-English
language characters.  So, ELEMENT OF is the correct mapping.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification F9

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	F9

F9 2219 - ∙ BULLET OPERATOR CORRECT
F9 25AA - ▪ BLACK SMALL SQUARE INCORRECT - BUT LOOKS BETTER IN SOME FONTS

IBM PC Glyph F9 looks like a small dot. The characters in the range EC-FE are
Math symbols. With that in mind, BULLET OPERATOR is CORRECT since it is a math
symbol.

The IBM PC character set has three similar characters at 07, F9 and FA. IBM PC
Glyph FA is the smallest dot. F9 is a little larger than FA. And finally 07
is the largest dot.

The *actual* size and shape of IBM PC Glyph 07 is a 4Wx4H pixel circle, F9 is a
2Wx2H pixel square, and FA is a 2Wx1H pixel rectangle. I believe that the non
circle shape of F9 and FA is a result of insufficient pixel resolution to make
round dots smaller than IBM PC Glyph 07. I believe that the intention is for
circular dots in F9 and FA. This situation might lead someone to say that F9
is actually a BLACK SMALL SQUARE, and FA is a BLACK SMALL RECTANGLE. A BLACK
SMALL SQUARE does exist in Unicode (mentioned above as an INCORRECT mapping for
F9) but no BLACK SMALL RECTANGLE exists. I say it is INCORRECT because if it
is used for F9, there is no smaller character to use for FA such as a
hypothetical BLACK SMALL RECTANGLE. The smallest existing midway elevated
Unicode character is DOT OPERATOR and I will use that in Clarification FA
below. It is a small circle. So, again, I will say that BULLET OPERATOR is
CORRECT since it is a slightly larger version of the DOT OPERATOR.

Now for the flip side... The BLACK SMALL SQUARE character is an INCORRECT
mapping but it looks better in some fonts relative to the sizing of all three
similar characters at 07, F9 and FA. I found that BLACK SMALL SQUARE appeared
larger than the character mapped to FA (DOT OPERATOR) but smaller than the
character mapped to 07 (BULLET) in *some* fonts. This satisfies the
requirement for the character mapped to F9 to be the mid-sized dot in *some*
fonts. You make your own choice relative to the font being used.

Unicode character DOT OPERATOR is the smallest, BULLET OPERATOR is the mid
sized character and BULLET is the largest character. So, ultimately BULLET
OPERATOR is the correct mapping to F9, just the fonts sometimes fail to
have the correct size relationships between characters.

This Clarification F9 is not an error in existing Unicode mappings, but rather
a suggestion for using BLACK SMALL SQUARE (and the recognition that the actual
IBM PC Glyph F9 is a small square) when using some fonts where BLACK SMALL
SQUARE looks better than BULLET OPERATOR (even though it is an INCORRECT
mapping).

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification FA

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	FA

FA      22C5 - ⋅ DOT OPERATOR                           CORRECT
FA      00B7 - · MIDDLE DOT                             INCORRECT

IBM PC Glyph FA looks like a small dot.  The characters in the range EC-FE are
Math symbols.  With that in mind, DOT OPERATOR is CORRECT since it is a math
symbol.  MIDDLE DOT is not a math symbol.  Also, DOT OPERATOR satisfies the
correct size relationship between F9 (mapped as BULLET OPERATOR) and FA where
FA (mapped as DOT OPERATOR) is smaller.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Clarification FE

My Correct Unicode Mapping	IBM PC Glyph	Incorrect Unicode Mapping
	FE

FE      220E - ∎ END OF PROOF (Q.E.D.)                  CORRECT
FE      25A0 - ■ BLACK SQUARE                           INCORRECT

IBM PC Glyph FE looks like a small vertical rectangle.  The characters in the
range EC-FE are Math symbols.  With that in mind, END OF PROOF (Q.E.D.) is
CORRECT since it is a math symbol.  BLACK SQUARE is a Geometric Shapes and not
a math symbol.  Additionally, FE is not a square, it is slightly taller than
it is wide (5 wide x 6 high), which is like the math symbol END OF PROOF
(Q.E.D.) which is taller than it is wide.

=============================================================================

Notes: Some of the INCORRECT character mappings are possible alternate uses
for the associated IBM PC character.  For example, you might use EE (ELEMENT
OF) in a place which calls for a GREEK SMALL LETTER EPSILON.

All of the Unicode character images were derived from the Unicode Version 16.0
.PDF character charts on the Unicode.org site. [9]

All of the IBM PC glyphs were derived from the IBM Monochrome Graphics
Adapter's (MDA's) character ROM 8x14 (9x14 on MDA screen) font.


The Completely Correct IBM PC Character Set Definition
------------------------------------------------------

Now that I have gotten the clarifications out of the way, I will provide you
with an as accurate as possible definition of the IBM PC character set.:

IBM PC Code Page 437 to Unicode Mapping Table [10]


IBM PC Code Page 437 character ranges by D.J.B and M.W. ^ ^
=======================================================

Here is a cleaned up version (with hex codes) of D.J.B.'s range chart [11].

Hex / Decimal Range   Description
--------------------- ----------------------------------------------
00 / 0                Blank
01-1F, 7F / 1-31, 127 Non-printing control characters
  01-0F / 1-15          Game playing characters
  10-1F / 16-31         Text editing / Word processing markers
20-7E / 32-126        ASCII graphic characters
7F / 127              HOUSE character
80-A5 / 128-165       International characters
  9B-9F / 155-159       Currency symbols
A6-AF / 166-175       Miscellaneous typewriter keyboard characters
B0-DF / 176-223       "Box Drawing" and "Block" graphics characters
E0-FE / 224-254       Math characters
  E0-EB / 224-235       Greek Math characters
  EC-FE / 236-254       Math symbols
FF / 255              Blank
--------------------- ----------------------------------------------


Other findings
==============

Inconsistent Naming of the IBM PC Character Set
-----------------------------------------------

In my research and past observations in this area I have seen great variation
in the names associated with the IBM PC's character set.  Below is a list of
the most popular names.  It is good to keep in mind that the only correct
names for the IBM PC's character set are derived from the following IBM
document from 1984:

CP00437.txt IBM Personal Computer GCGID and GCGID Name table [12]

From the header in that file:

"Code Page (CPGID)    : 00437" to: "Code Page 437" or abbreviated as: "CP437"

And the associated .PDF file

CP00437.pdf IBM Corporate Specification C-H 3-3220-050 [13]

From the header in that .PDF file:

"Code Page CPGID 00437" to: "Code Page 437" or abbreviated as: "CP437"

It would be nice if there was a way to make everyone standardize on those
names since they are the only real names from IBM themselves.

Actually, if you want to nitpick "Code Page CPGID 00437" from that .PDF file is
the only exact true name from IBM that we can cite.

Lastly, it is unclear what IBM called the IBM PC's character set from 1981
through to 1984.  In 1984 they created Code Page 437.

ANSI Character Set
ECS
IBM PC Extended Character Set (ECS)
IBM PC-8
IBM PC 8-bit US
IBM PC 8-bit U.S.
IBM PC 8-bit graphics characters
PC-8
PC-8 Code Page 437
CP437
CP-437
IBM437
csPC8
csPC8CodePage437
CodePage437
Codepage 437
Code Page 437
PC CP 437 (Original PC Code Page)
IBM CP437
IBM CP 437
IBM PC CP 437
IBM PC code page 437
US8PC437
OEM font
OEM-US
OEM437
OEM 437
OEM Codepage 437
OEM United States
DOS font
VGA font
DOS console font
VGA console font
MS-DOS VGA FONT
MS-DOS United States
MS-DOS 8-bit extended ASCII
MS-DOS Codepage 437
MS-DOS code page CP437 (DOSLatinUS)
MS-DOS Latin US
DOSLatinUS
DOSLatinUS (cp437)
Microsoft DOS OEM Codepage 437 (US)
Microsoft Windows OEM Codepage 437 (US)


The ANSI Character Set misnomer - Part 1: MS/PC-DOS
---------------------------------------------------

The IBM PC included a file on the PC-DOS boot disk called "ANSI.SYS."
That file, when used in the CONFIG.SYS file, allowed for the support of
a subset of the ANSI X3.64-1979 standard escape sequences when data was
printed to the screen.  This primarily would allow the cursor to be
positioned at specific x-y locations and the color of text to be changed
to any of those available.  These ANSI X3.64-1979 escape sequences would
work over a modem when connecting to BBS systems.  Unfortunately, the people
involved with those BBS systems began to refer to the files, which
contained ANSI X3.64-1979 escape sequences and IBM PC character set
characters, as ANSI Graphics or ANSI Art.  They also have referred to the
IBM PC character set as the "ANSI Character Set", since it was used in the
ANSI Art files.  This lumping together of things has created another
incorrect name for the IBM PC character set.  This misnomer is being used
today by people in the ANSI art scene.  Hopefully those reading this, that
are involved in the ANSI Art scene, will spread the word describing this
error and now correctly refer to the character set as Code Page 437 or CP437.

The ANSI Character Set misnomer - Part 2: Windows
-------------------------------------------------

To add further confusion, Microsoft had chosen to refer to the character sets
in Windows (Windows Latin 1 / Windows-1252 / CP1252) as the ANSI code pages
(character sets).  The Windows character sets are unrelated to the character
set in MS/PC-DOS.

Here is Microsoft's explanation [14]:

 The term "ANSI" as used to signify Windows code pages is a historical
 reference, but is nowadays a misnomer that continues to persist in the Windows
 community.  The source of this comes from the fact that the Windows code page
 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1.
 However, in adding code points to the range reserved for control codes in the
 ISO standard, the Windows code page 1252 and subsequent Windows code pages
 originally based on the ISO 8859-x series deviated from ISO.  To this day, it
 is not uncommon to have the development community, both within and outside of
 Microsoft, confuse the 8859-1 code page with Windows 1252, as well as see
 "ANSI" or "A" used to signify Windows code page support.

As can be seen, there is the possibility for confusion between the two
misnomers, one in MS/PC-DOS and the other in Windows!

You can also see this misnomer mentioned in less detail here: [15]


Closing
=======

I hope you appreciate the work that I put into this article.  This is the
culmination of many years of thought on the mater.  I also hope that you
found my IBM PC Code Page 437 to Unicode Mapping Table [10] to be useful.

If you have any input on this work, please feel free to email me with your
thoughts and I will try to reply.

Thanks,
Michael Walden


Links and References
====================

[1] ^ Code page 437 - Wikipedia
 <https://en.Wikipedia.org/wiki/Code_page_437>
 Currently (as of 2025-04-20) Wikipedia's Code page 437 has 15 INCORRECT default mappings:
  00 0000, 0D 266A, 0E 266B, 10 25BA, 11 25C4, 1C 221F, 27 0027, 7C 007C, E1 00DF, E6 00B5, E7 03C4, E9 0398, ED 03C6, EE 03B5, FA 00B7, FE 25A0

[2] ^ Unicode.org Mapping: IBM PC memory-mapped video graphics to Unicode
 <https://Unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT>
 IBMGRAPH.TXT has only characters 01...1F, 7F and 6 INCORRECT mappings:
  00 ----, 0D 266A, 0E 266B, 10 25BA, 11 25C4, 1C 221F

Use the above mapping in conjunction with one of the following two mappings to make a whole Code page 437 character set mapping.

[3] ^ Unicode.org Mapping: CP437_DOSLatinUS to Unicode table (by Microsoft)
 <https://Unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT>
 CP437.TXT has 12 INCORRECT mappings:
  00 0000, 27 0027, 7c 007c, 7f 007f, e1 00df, e6 00b5, e7 03c4, e9 0398, ed 03c6, ee 03b5, fa 00b7, fe 25a0
 The following mapping is identical to the above one.  Both are from Microsoft.
[4] ^ Microsoft: OEM 437
 <https://Web.Archive.org/web/20090116210205/http://www.microsoft.com/globaldev/reference/oem/437.mspx>
 OEM 437.mspx has 12 INCORRECT mappings:
  00 0000, 27 0027, 7C 007C, 7F 007F, E1 00DF, E6 00B5, E7 03C4, E9 0398, ED 03C6, EE 03B5, FA 00B7, FE 25A0

[5] ^ CP437 (DOSLatinUS) by Roman Czyborra
 <http://Czyborra.com/charsets/codepages.html>
  <http://Czyborra.com/charsets/cp437.txt.gz>
  12 INCORRECT mappings:
   00 ----, 09 25E6, 0D 266A, 0E 266B, 1C 2310, 27 0027, 7C 007C, 7F 007F, E7 03C4, E9 0398, FA 00B7, FE 25A0

The above five Code page 437 Unicode mappings are all I have now, I did have others but unfortunately lost track of them.

[6] ^ Dr. David J. Bradley on IBM PC's Character Set and More
 <https://MW.Rat.bz/djb/>

[7] ^ IBM-PC-BIOS/PCBIOS.ASM at main · philspil66/IBM-PC-BIOS · GitHub
 <https://GitHub.com/philspil66/IBM-PC-BIOS/blob/main/PCBIOS.ASM#L5640>

[8] ^ IBM PC 5150 Technical Reference 6361453 (### See address FE66 ###) 
 <https://Archive.org/details/IBMPCIBM5150TechnicalReference6322507APR84/page/n202/mode/1up>

[9] ^ Unicode XX.X Character Code Charts
 <https://Unicode.org/charts/>

[10] ^ ^ IBM PC Code Page 437 to Unicode Mapping Table
 <https://MW.Rat.bz/cp437map>

[11] ^ Note #1 - IBM PC Code Page 437 character ranges
 <https://MW.Rat.bz/djb/#N1>

[12] ^ CP00437.txt IBM Personal Computer GCGID and GCGID Name table
 <https://Public.DHE.IBM.com/software/globalization/gcoc/attachments/CP00437.txt>

[13] ^ CP00437.pdf IBM Corporate Specification C-H 3-3220-050
 <https://Public.DHE.IBM.com/software/globalization/gcoc/attachments/CP00437.pdf>

[14] ^ Unicode and Windows XP - Cathy Wissink - Program Manager, Windows Globalization - Microsoft Corporation - Dublin, Ireland, May 2002
 <https://Web.Archive.org/web/20060223061715/http://download.microsoft.com/download/5/6/8/56803da0-e4a0-4796-a62c-ca920b73bb17/21-Unicode_WinXP.pdf>
 (See the string "The term "ANSI" as..." at the bottom of page 1)
 21st. International Unicode Conference - Posted: May 2002	
  <https://Web.Archive.org/web/20050119065046/https://www.microsoft.com/globaldev/reference/presentations/21st_Unicode_Conf.mspx>
  (Has a link to "Unicode and Windows XP" which is the link [15] above)
 The Old New Thing - Why is the default 8-bit codepage called "ANSI"?
  <https://Web.Archive.org/web/20060211031914/http://blogs.msdn.com/oldnewthing/archive/2004/05/31/144893.aspx>
  (This web page also talks about the use of "ANSI" codepages)
  (Now The Old New Thing by Raymond Chen is at <https://devblogs.microsoft.com/oldnewthing/>)

[15] ^ ANSI character set - Wikipedia
 <https://en.Wikipedia.org/wiki/ANSI_character_set>

-------------------------------------------------------------------------------


   (This document was originally published here: https://MW.Rat.bz/confusion )
                         Counter:  997  (Since 2025-04-26)
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                              [Back] [Top] [Index]
                            [Contact Michael Walden]
                      [Validates as HTML 4.01 Transitional]
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
                                     [EOF]