-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- [Back] [Bottom] [Index] -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- -=> The IBM PC Character Set Confusion Clarified <=- By Michael Walden - Created: 2025-04-26 - Updated: 2025-04-26 Introduction ------------ Since the early eighties, I have been using computer equipment from the PC compatible market. I have been exposed to a vast array of the following items: IBM computers, IBM clones, MS-DOS compatibles, video display adapter cards, printers, modems, serial terminals, and more. From my first experience, I was fascinated by the diverse set of characters included in character tables in the hardware technical reference manuals, software applications, and books that I saw. I always wondered what the names and purposes of the unusual characters were. Over time it became apparent to me that there were subtle differences in the character set from one implementation to another. It should be noted that all of the character sets were intended to be IBM PC compatible (i.e. identical), yet they would go astray. Eventually an IBM PC Technical Reference Manual made its way to me. I had the expectation that I would finally see a definitive list of the IBM PC character set specifications. Unfortunately, to my dismay, no such information was provided. The only substantial info relating to the character set was a two page table that depicted the glyphs. Some of the glyphs in the table did not correctly reflect those actually produced on screen. Even IBM did not get the character set correctly depicted in their own manual. I guess that the typesetter was struggling to locate matching type for all of those strange characters. Apparently IBM was in such a rush to get the PC to market that they neglected to include and/or define the detailed description of the PC's character set. It should come as no surprise that there are inconsistencies in other implementations of the IBM PC's character set, when you consider that there was no official description in the IBM PC Technical Reference manual. This allows for confusion to occur when implementing the IBM PC character set in a device and when using the characters in the character set. The Current Problem ------------------- Why, you might ask, am I describing this 40+ year old matter? It may come as a surprise to find out that this problem is still being perpetuated in hardware, such as video display adapters, and software, such as MS-Windows. The Unicode Consortium has produced a standard for a universal character set that is intended to be a superset of prior character sets. They also provide mapping tables which define the relationship of the characters in a character set, such as the one in the IBM PC (Code Page 437), to the *same* characters within the Unicode universal character set. Here again, it is my opinion that the IBM PC's character set is incorrectly defined by the present Unicode mapping tables. [1][2][3][4][5] (Note: As you will see, Unicode universal character set code points are used here since they provide a good backbone to work from and the Unicode universal character set appears to be the character set of choice for the foreseeable future.) The primary places where the IBM PC character set is seen in computers today is in the text displayed when booting a PC, legacy DOS applications (in DOS, FreeDOS, and in a DOS box under MS-Windows (COMMAND.EXE)), the command prompt within MS-Windows (CMD.EXE), x86 Linux console applications, Point Of Sale (POS) computer applications, terminal emulation software, telecommunication software, IBM PC emulators, IRC chat, inside .ZIP archive files (*.nfo / *.diz / *.txt / read.me / read.1st) and in the ANSI art scene. While the areas affected may be limited, it does not make sense to knowingly perpetuate a mistake when you can just as easily correct it. The Solution ------------ It is my intention to produce the most accurate definition possible of the IBM PC character set. I will do this, primarily, by looking at the only sound evidence, the character set bitmaps from the IBM PC MDA video card. Additionally, I have been attempting to contact others that were involved with the creation of the PC at IBM. I did manage to contact Dr. David J. Bradley and asked him about the characters in the IBM PC character set. [6] He was mainly able to provide me with character range purposes but not all of the exact names of the characters in the character set. I have attempted to track down others involved in the making of the IBM PC without success. I will also point out that others have stated that there is no way possible to state what some of the characters are since there are multiple uses for them. While it may be true that multiple uses are possible, it is my opinion that there are specific unique characteristics of the characters within the IBM PC that indicate which one of the multiple uses was the prime one intended. Those prime intentions are the ones that should be statically defined as the characters that are used in the IBM PC character set. As an example, the GREEK SMALL LETTER BETA and the LATIN SMALL LETTER SHARP S may appear *substantially* identical, but the actual character used when creating the IBM PC character set was one or the other, not both. If it were both, there would have been some hybrid character designed that exhibited attributes of both characters equally (which is actually impossible to do, in my opinion). Additionally, some characters may have ambiguous interpretations when viewed independently, but when viewed in the context of the character set range in which they occur, I find the characters to be unambiguous. Below is a list of the 20 characters which are, in my opinion, defined incorrectly in the Unicode mappings. For each incorrectly mapped IBM PC character, the first line is my correct Unicode mapping and the following line(s) is/are the incorrect mapping(s). Then I give a description of the reason for my disagreement with the incorrect mapping. This will provide clarification of the correct characters in the IBM PC character set (Code page 437). Note that all numbers in the following section are hexadecimal. Note: On the right of my assessments below, the "WIDELY SUPPORTED" statements relate to the character's availability in fonts at the time I originally did my work. Currently these statements are mostly obsolete since all of my CORRECT characters (except 1D161 - 𝅘𝅥𝅯 MUSICAL SYMBOL SIXTEENTH NOTE) are currently supported in software and most fonts. One lingering issue is that some of the NOT WIDELY SUPPORTED characters are rendered in a different size from most of the other characters in some fonts. ============================================================================= Clarification -- Format:
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 00 | ![]() |
Unicode IBM PC Hex Code Unicode Hex Point - Char Unicode Character Name My Assessment ------- -------- - ------------------------------------ CORRECT ------- -------- - ------------------------------------ INCORRECT Description ----------------------------------------------------------------------------- ... ============================================================================= Clarification 00
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 00 | ![]() |
00 2007 - FIGURE SPACE CORRECT 00 0000 - <control> = NULL INCORRECT Note: The two characters pictured above 2007 and 0000 are shown with dashed line square surrounds indicating that there is no glyph for each character. There is an issue with IBM PC character 00. The dilemma is what I should map it to. The IBM PC has three blank characters 00 Null, 20 Space, and FF No- Break Space. IBM PC 20 maps to Unicode 0020 SPACE, and IBM PC FF maps to Unicode 00A0 NO-BREAK SPACE. Mapping IBM PC 00 to Unicode 0000 NULL is not correct to do since Unicode 0000 NULL is a control code and has no glyph assigned to it. An example of this is that you can't display a Unicode 0000 NULL character on an HTML web page since it has no glyph. When attempting to print a Unicode NULL control character on a web page (by using �) it is replaced by FFFD � REPLACEMENT CHARACTER, by the web browser, which is not a blank character. Yet, the IBM PC does have a glyph assigned to 00 Null that just happens to be a blank space. That can be demonstrated, when placing a 00 Null value into IBM PC video ram, by you seeing that a blank glyph will appear in the character cell filled with the Null. To resolve this issue I would need to choose another blank Unicode character to map to the IBM PC 00 Null. You might say to just map one of the other two blank characters to 00 Null, such as 00A0 NO-BREAK SPACE to produce a blank for 00 Null. While that would result in a visually correct conversion of an IBM PC text mode screen capture, it would result in a different issue. A Round-trip format conversion from IBM PC to Unicode and back to IBM PC would result in data loss of the IBM PC character 00 or FF since they both map to the same Unicode 00A0 NO-BREAK SPACE. So the best solution is to map IBM PC 00 Null to another Unicode blank character. After searching the Unicode character database I have come up with two possible candidates for blank characters: 2007 ( ) FIGURE SPACE [ ] • space equal to tabular width of a font • this is equivalent to the digit width of fonts with fixed-width digits 202F ( ) NARROW NO-BREAK SPACE (NNBSP) [ ] or [&#x202F;] • a narrow form of a no-break space, typically the width of a thin space or a mid space Of those two I chose to use 2007 FIGURE SPACE since it should be the correct character width to use when working with fixed-width characters from an IBM PC text mode screen capture. I am not knowledgeable in the use of the Unicode 2007 FIGURE SPACE character and whether or not there might be an issue using it as a blank space when converting IBM PC information into Unicode. As far as I can tell there should be no problem. If a problem is found I will revise the mapping of IBM PC character 00 at that time. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 0D
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 0D | ![]() |
0D 1D161 - 𝅘𝅥𝅯 MUSICAL SYMBOL SIXTEENTH NOTE CORRECT - BUT TOO NEW FOR SUPPORT IN APPLICATIONS AND FONTS 0D 266A - ♪ EIGHTH NOTE INCORRECT - BUT WIDELY SUPPORTED IBM PC Glyph 0D clearly has two horizontal protrusions extending to the right of the vertical. This is indicative of a MUSICAL SYMBOL SIXTEENTH NOTE, not an EIGHTH NOTE, which has only one horizontal protrusion. Technically, the two protrusions should be flags, rather than beams, when there is only one note. IBM PC Glyph 0D resembles 266C - ♬ BEAMED SIXTEENTH NOTES with the lower right portion removed. I believe that the depiction of IBM PC Glyph 0D, the MUSICAL SYMBOL SIXTEENTH NOTE, was altered slightly in response to the limited resolution available in a character cell. Remember that all of the characters were depicted in 9x14 (MDA) and 8x8 (CGA) character cells. The limited resolution of the 8x8 (CGA) implementation probably influenced the 9x14 (MDA) implementation since the designers probably wanted the character to appear substantially the same in both MDA and CGA displays. It would be difficult to depict the curved flag shapes, so the beams were used instead. Unfortunately, the CORRECT mapping MUSICAL SYMBOL SIXTEENTH NOTE is not supported in current applications and fonts since the character is too new. It is not currently possible to correct this INCORRECT mapping. For the present time the use of the EIGHTH NOTE, which is a widely supported character, is reasonable but still INCORRECT. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 0E
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 0E | ![]() |
0E 266C - ♬ BEAMED SIXTEENTH NOTES CORRECT 0E 266B - ♫ BEAMED EIGHTH NOTES INCORRECT IBM PC Glyph 0E clearly has two horizontal beams extending between the two vertical lines. This is indicative of BEAMED SIXTEENTH NOTES, not BEAMED EIGHTH NOTES, which has one horizontal beam extending between the two vertical lines. So, BEAMED SIXTEENTH NOTES is the correct mapping. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 10
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 10 | ![]() |
10 25B6 - ▶ BLACK RIGHT-POINTING TRIANGLE CORRECT - BUT NOT WIDELY SUPPORTED 10 25BA - ► BLACK RIGHT-POINTING POINTER INCORRECT - BUT WIDELY SUPPORTED IBM PC Glyphs 10 and 11 (see below) are closer to equilateral triangles, which have three congruent sides (see BLACK RIGHT-POINTING TRIANGLE and BLACK LEFT- POINTING TRIANGLE), than isosceles triangles which have two congruent sides (see BLACK RIGHT-POINTING POINTER and BLACK LEFT-POINTING POINTER). Notice how the BLACK RIGHT-POINTING POINTER and BLACK LEFT-POINTING POINTER are squashed in the vertical direction. Additionally, I believe that 10 and 11 are related to 1E and 1F which are ▲ BLACK UP-POINTING TRIANGLE and ▼ BLACK DOWN- POINTING TRIANGLE respectively. Changing 10 and 11 to my correct mapping would create a complete set of the four directional triangles, like the IBM PC appears to have. Alternatively, you could say that 1E and 1F are the incorrectly mapped characters, not 10 and 11, but there are no up or down pointers that could be mapped to 1E and 1F. So, BLACK RIGHT-POINTING TRIANGLE and BLACK LEFT-POINTING TRIANGLE are the CORREXT mappings for 10 and 11. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 11
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 11 | ![]() |
11 25C0 - ◀ BLACK LEFT-POINTING TRIANGLE CORRECT - BUT NOT WIDELY SUPPORTED 11 25C4 - ◄ BLACK LEFT-POINTING POINTER INCORRECT - BUT WIDELY SUPPORTED See Clarification 10 (IBM PC Glyph 10) above. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 1C
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 1C | ![]() |
1C 2319 - ⌙ TURNED NOT SIGN (line marker) CORRECT 1C 221F - ∟ RIGHT ANGLE INCORRECT IBM PC Glyph 1C has a vertical line that is shorter than the horizontal line. RIGHT ANGLE has both horizontal and vertical lines of equal length. Therefore, 1C is not a RIGHT ANGLE symbol. The RIGHT ANGLE symbol is a math symbol having no other alias. All of the characters in the 01-1F range are unrelated to math, so it makes sense that the character must be something other than RIGHT ANGLE. 1C resembles the TURNED NOT SIGN's alias "line marker" which is a non-math symbol. So, TURNED NOT SIGN is CORRECT. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 27
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 27 | ![]() |
27 2019 - ’ RIGHT SINGLE QUOTATION MARK CORRECT 27 0027 - ' APOSTROPHE INCORRECT IBM PC Glyph 27 has a high comma shape. So, the RIGHT SINGLE QUOTATION MARK which also appears as a high comma is the CORRECT mapping with respect to the visual appearance of the glyph. The use of the RIGHT SINGLE QUOTATION MARK character in software is not the same as APOSTROPHE which is an ASCII character. If your intention is to replicate the exact appearance then use RIGHT SINGLE QUOTATION MARK. If you want to replicate the ASCII use of the character in software, then use APOSTROPHE. In this file I am looking at replicating the appearance of the IBM PC, so I am saying RIGHT SINGLE QUOTATION MARK is CORRECT. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 7C
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 7C | ![]() |
7C 00A6 - ¦ BROKEN BAR CORRECT 7C 007C - | VERTICAL LINE INCORRECT IBM PC Glyph 7C has a notch taken out of the center of a vertical line. So, the BROKEN BAR is the CORRECT mapping with respect to the visual appearance of the glyph. The use of the BROKEN BAR character in software is not the same as VERTICAL LINE which is an ASCII character. If your intention is to replicate the exact appearance then use BROKEN BAR. If you want to replicate the ASCII use of the character in software, then use VERTICAL LINE. In this file I am looking at replicating the appearance of the IBM PC, so I am saying BROKEN BAR is CORRECT. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification 7F
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() 7F | ![]() |
7F 2302 - ⌂ HOUSE CORRECT 7F 0394 - Δ GREEK CAPITAL LETTER DELTA INCORRECT IBM PC Glyph 7F appears like a squat house character. So, HOUSE is the closest Unicode character and the CORRECT mapping to the IBM PC Glyph. Technically, that is not the correct mapping relative to the intended meaning, which is GREEK CAPITAL LETTER DELTA. Look at the following two links to see where the ROM BIOS assembly code remark shows DELTA (short for GREEK CAPITAL LETTER DELTA). IBM-PC-BIOS/PCBIOS.ASM at main · philspil66/IBM-PC-BIOS · GitHub [7] IBM PC 5150 Technical Reference 6361453 (### See address FE66 ###) [8] In this file I am looking at replicating the appearance of the IBM PC, so I am saying HOUSE is CORRECT. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification E0
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() E0 | ![]() |
E0 03B1 - α GREEK SMALL LETTER ALPHA CORRECT E0 221D - ∝ PROPORTIONAL TO INCORRECT IBM PC Glyph E0 falls in the character range E0-EB which is the Greek Math characters range. The PROPORTIONAL TO Unicode character falls in the Mathematical Operators Range: 2200–22FF at code point 221D. The correct mapping is the GREEK SMALL LETTER ALPHA based on the range where it appears in the "IBM PC Code Page 437 character ranges by D.J.B and M.W." table. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification E1
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() E1 | ![]() |
E1 03B2 - β GREEK SMALL LETTER BETA CORRECT E1 00DF - ß LATIN SMALL LETTER SHARP S INCORRECT IBM PC Glyph E1 clearly has it's left hand side descending below the baseline, just like in the GREEK SMALL LETTER BETA. E1's bottom horizontal connects to the vertical, just like in the GREEK SMALL LETTER BETA. Also note that all of the characters in the range E0-EB are Greek Math characters. With that in mind, LATIN SMALL LETTER SHARP S would not be correct since it is not a Greek Math character. So, GREEK SMALL LETTER BETA is the correct mapping. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification E6
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() E6 | ![]() |
E6 03BC - μ GREEK SMALL LETTER MU CORRECT E6 00B5 - µ MICRO SIGN INCORRECT IBM PC Glyph E6 looks like either GREEK SMALL LETTER MU or MICRO SIGN. The concern here is strictly relative to the meaning of the character not the shape. The characters in the range E0-EB are Greek Math characters. With that in mind, MICRO SIGN would not be correct since it is not a Greek math character. So, GREEK SMALL LETTER MU is the correct mapping. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification E7
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() E7 | ![]() |
E7 03B3 - γ GREEK SMALL LETTER GAMMA CORRECT E7 03C4 - τ GREEK SMALL LETTER TAU INCORRECT IBM PC Glyph E7 has a non-straight horizontal piece, unlike the GREEK SMALL LETTER TAU which is a mostly straight line. Under close inspection, the left portion has a hook and the right side leans up to the right, just like the GREEK SMALL LETTER GAMMA. Admittedly, E7 is difficult to discern due to it's poor rendering. I believe that this is due to the limited character cell resolution. So, GREEK SMALL LETTER GAMMA is the correct mapping. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification E9
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() E9 | ![]() |
E9 03B8 - θ GREEK SMALL LETTER THETA CORRECT E9 0398 - Θ GREEK CAPITAL LETTER THETA INCORRECT IBM PC Glyph E9 clearly has connections between the left and right ends of the horizontal line and the sides of the oval. This is like the GREEK SMALL LETTER THETA, not GREEK CAPITAL LETTER THETA which has breaks between the ends of the horizontal line and the sides of the oval. So, GREEK SMALL LETTER THETA is the correct mapping - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification EA
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() EA | ![]() |
EA 03A9 - Ω GREEK CAPITAL LETTER OMEGA CORRECT EA 2126 - Ω OHM SIGN INCORRECT IBM PC Glyph EA, looks like either GREEK CAPITAL LETTER OMEGA or OHM SIGN. The concern here is strictly relative to the meaning of the character not the shape. The characters in the range E0-EB are Greek Math characters. With that in mind, OHM SIGN would not be correct since it is not a Greek math character. So, GREEK CAPITAL LETTER OMEGA is the correct mapping. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification ED
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mappings |
![]() | ![]() ED | ![]() ![]() |
ED 2205 - ∅ EMPTY SET (null set) CORRECT - BUT NOT WIDELY SUPPORTED ED 03C6 - φ GREEK SMALL LETTER PHI INCORRECT - BUT WIDELY SUPPORTED ED 00F8 - ø LATIN SMALL LETTER O WITH STROKE INCORRECT - BUT WIDELY SUPPORTED IBM PC Glyph ED looks like EMPTY SET, GREEK SMALL LETTER PHI, and LATIN SMALL LETTER O WITH STROKE. The characters in the range EC-FE are Math symbols. With that in mind, EMPTY SET would be correct since it is a math symbol. The other two characters are not math symbols, they are non-English language characters. So, EMPTY SET is the correct mapping. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification EE
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mappings |
![]() | ![]() EE | ![]() ![]() |
EE 2208 - ∈ ELEMENT OF CORRECT - BUT NOT WIDELY SUPPORTED EE 03B5 - ε GREEK SMALL LETTER EPSILON INCORRECT - BUT WIDELY SUPPORTED EE 0404 - Є CYRILLIC CAPITAL LETTER UKRAINIAN IE INCORRECT - BUT WIDELY SUPPORTED IBM PC Glyph EE looks like ELEMENT OF, GREEK SMALL LETTER EPSILON, and CYRILLIC CAPITAL LETTER UKRAINIAN IE. The characters in the range EC-FE are Math symbols. With that in mind, ELEMENT OF would be correct since it is a math symbol. The other two characters are not math symbols, they are non-English language characters. So, ELEMENT OF is the correct mapping. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification F9
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() F9 | ![]() |
F9 2219 - ∙ BULLET OPERATOR CORRECT F9 25AA - ▪ BLACK SMALL SQUARE INCORRECT - BUT LOOKS BETTER IN SOME FONTS IBM PC Glyph F9 looks like a small dot. The characters in the range EC-FE are Math symbols. With that in mind, BULLET OPERATOR is CORRECT since it is a math symbol. The IBM PC character set has three similar characters at 07, F9 and FA. IBM PC Glyph FA is the smallest dot. F9 is a little larger than FA. And finally 07 is the largest dot. The *actual* size and shape of IBM PC Glyph 07 is a 4Wx4H pixel circle, F9 is a 2Wx2H pixel square, and FA is a 2Wx1H pixel rectangle. I believe that the non circle shape of F9 and FA is a result of insufficient pixel resolution to make round dots smaller than IBM PC Glyph 07. I believe that the intention is for circular dots in F9 and FA. This situation might lead someone to say that F9 is actually a BLACK SMALL SQUARE, and FA is a BLACK SMALL RECTANGLE. A BLACK SMALL SQUARE does exist in Unicode (mentioned above as an INCORRECT mapping for F9) but no BLACK SMALL RECTANGLE exists. I say it is INCORRECT because if it is used for F9, there is no smaller character to use for FA such as a hypothetical BLACK SMALL RECTANGLE. The smallest existing midway elevated Unicode character is DOT OPERATOR and I will use that in Clarification FA below. It is a small circle. So, again, I will say that BULLET OPERATOR is CORRECT since it is a slightly larger version of the DOT OPERATOR. Now for the flip side... The BLACK SMALL SQUARE character is an INCORRECT mapping but it looks better in some fonts relative to the sizing of all three similar characters at 07, F9 and FA. I found that BLACK SMALL SQUARE appeared larger than the character mapped to FA (DOT OPERATOR) but smaller than the character mapped to 07 (BULLET) in *some* fonts. This satisfies the requirement for the character mapped to F9 to be the mid-sized dot in *some* fonts. You make your own choice relative to the font being used. Unicode character DOT OPERATOR is the smallest, BULLET OPERATOR is the mid sized character and BULLET is the largest character. So, ultimately BULLET OPERATOR is the correct mapping to F9, just the fonts sometimes fail to have the correct size relationships between characters. This Clarification F9 is not an error in existing Unicode mappings, but rather a suggestion for using BLACK SMALL SQUARE (and the recognition that the actual IBM PC Glyph F9 is a small square) when using some fonts where BLACK SMALL SQUARE looks better than BULLET OPERATOR (even though it is an INCORRECT mapping). - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification FA
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() FA | ![]() |
FA 22C5 - ⋅ DOT OPERATOR CORRECT FA 00B7 - · MIDDLE DOT INCORRECT IBM PC Glyph FA looks like a small dot. The characters in the range EC-FE are Math symbols. With that in mind, DOT OPERATOR is CORRECT since it is a math symbol. MIDDLE DOT is not a math symbol. Also, DOT OPERATOR satisfies the correct size relationship between F9 (mapped as BULLET OPERATOR) and FA where FA (mapped as DOT OPERATOR) is smaller. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Clarification FE
My Correct Unicode Mapping | IBM PC Glyph | Incorrect Unicode Mapping |
![]() | ![]() FE | ![]() |
FE 220E - ∎ END OF PROOF (Q.E.D.) CORRECT FE 25A0 - ■ BLACK SQUARE INCORRECT IBM PC Glyph FE looks like a small vertical rectangle. The characters in the range EC-FE are Math symbols. With that in mind, END OF PROOF (Q.E.D.) is CORRECT since it is a math symbol. BLACK SQUARE is a Geometric Shapes and not a math symbol. Additionally, FE is not a square, it is slightly taller than it is wide (5 wide x 6 high), which is like the math symbol END OF PROOF (Q.E.D.) which is taller than it is wide. ============================================================================= Notes: Some of the INCORRECT character mappings are possible alternate uses for the associated IBM PC character. For example, you might use EE (ELEMENT OF) in a place which calls for a GREEK SMALL LETTER EPSILON. All of the Unicode character images were derived from the Unicode Version 16.0 .PDF character charts on the Unicode.org site. [9] All of the IBM PC glyphs were derived from the IBM Monochrome Graphics Adapter's (MDA's) character ROM 8x14 (9x14 on MDA screen) font. The Completely Correct IBM PC Character Set Definition ------------------------------------------------------ Now that I have gotten the clarifications out of the way, I will provide you with an as accurate as possible definition of the IBM PC character set.: IBM PC Code Page 437 to Unicode Mapping Table [10] IBM PC Code Page 437 character ranges by D.J.B and M.W. ^ ^ ======================================================= Here is a cleaned up version (with hex codes) of D.J.B.'s range chart [11]. Hex / Decimal Range Description --------------------- ---------------------------------------------- 00 / 0 Blank 01-1F, 7F / 1-31, 127 Non-printing control characters 01-0F / 1-15 Game playing characters 10-1F / 16-31 Text editing / Word processing markers 20-7E / 32-126 ASCII graphic characters 7F / 127 HOUSE character 80-A5 / 128-165 International characters 9B-9F / 155-159 Currency symbols A6-AF / 166-175 Miscellaneous typewriter keyboard characters B0-DF / 176-223 "Box Drawing" and "Block" graphics characters E0-FE / 224-254 Math characters E0-EB / 224-235 Greek Math characters EC-FE / 236-254 Math symbols FF / 255 Blank --------------------- ---------------------------------------------- Other findings ============== Inconsistent Naming of the IBM PC Character Set ----------------------------------------------- In my research and past observations in this area I have seen great variation in the names associated with the IBM PC's character set. Below is a list of the most popular names. It is good to keep in mind that the only correct names for the IBM PC's character set are derived from the following IBM document from 1984: CP00437.txt IBM Personal Computer GCGID and GCGID Name table [12] From the header in that file: "Code Page (CPGID) : 00437" to: "Code Page 437" or abbreviated as: "CP437" And the associated .PDF file CP00437.pdf IBM Corporate Specification C-H 3-3220-050 [13] From the header in that .PDF file: "Code Page CPGID 00437" to: "Code Page 437" or abbreviated as: "CP437" It would be nice if there was a way to make everyone standardize on those names since they are the only real names from IBM themselves. Actually, if you want to nitpick "Code Page CPGID 00437" from that .PDF file is the only exact true name from IBM that we can cite. Lastly, it is unclear what IBM called the IBM PC's character set from 1981 through to 1984. In 1984 they created Code Page 437. ANSI Character Set ECS IBM PC Extended Character Set (ECS) IBM PC-8 IBM PC 8-bit US IBM PC 8-bit U.S. IBM PC 8-bit graphics characters PC-8 PC-8 Code Page 437 CP437 CP-437 IBM437 csPC8 csPC8CodePage437 CodePage437 Codepage 437 Code Page 437 PC CP 437 (Original PC Code Page) IBM CP437 IBM CP 437 IBM PC CP 437 IBM PC code page 437 US8PC437 OEM font OEM-US OEM437 OEM 437 OEM Codepage 437 OEM United States DOS font VGA font DOS console font VGA console font MS-DOS VGA FONT MS-DOS United States MS-DOS 8-bit extended ASCII MS-DOS Codepage 437 MS-DOS code page CP437 (DOSLatinUS) MS-DOS Latin US DOSLatinUS DOSLatinUS (cp437) Microsoft DOS OEM Codepage 437 (US) Microsoft Windows OEM Codepage 437 (US) The ANSI Character Set misnomer - Part 1: MS/PC-DOS --------------------------------------------------- The IBM PC included a file on the PC-DOS boot disk called "ANSI.SYS." That file, when used in the CONFIG.SYS file, allowed for the support of a subset of the ANSI X3.64-1979 standard escape sequences when data was printed to the screen. This primarily would allow the cursor to be positioned at specific x-y locations and the color of text to be changed to any of those available. These ANSI X3.64-1979 escape sequences would work over a modem when connecting to BBS systems. Unfortunately, the people involved with those BBS systems began to refer to the files, which contained ANSI X3.64-1979 escape sequences and IBM PC character set characters, as ANSI Graphics or ANSI Art. They also have referred to the IBM PC character set as the "ANSI Character Set", since it was used in the ANSI Art files. This lumping together of things has created another incorrect name for the IBM PC character set. This misnomer is being used today by people in the ANSI art scene. Hopefully those reading this, that are involved in the ANSI Art scene, will spread the word describing this error and now correctly refer to the character set as Code Page 437 or CP437. The ANSI Character Set misnomer - Part 2: Windows ------------------------------------------------- To add further confusion, Microsoft had chosen to refer to the character sets in Windows (Windows Latin 1 / Windows-1252 / CP1252) as the ANSI code pages (character sets). The Windows character sets are unrelated to the character set in MS/PC-DOS. Here is Microsoft's explanation [14]: The term "ANSI" as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community. The source of this comes from the fact that the Windows code page 1252 was originally based on an ANSI draft, which became ISO Standard 8859-1. However, in adding code points to the range reserved for control codes in the ISO standard, the Windows code page 1252 and subsequent Windows code pages originally based on the ISO 8859-x series deviated from ISO. To this day, it is not uncommon to have the development community, both within and outside of Microsoft, confuse the 8859-1 code page with Windows 1252, as well as see "ANSI" or "A" used to signify Windows code page support. As can be seen, there is the possibility for confusion between the two misnomers, one in MS/PC-DOS and the other in Windows! You can also see this misnomer mentioned in less detail here: [15] Closing ======= I hope you appreciate the work that I put into this article. This is the culmination of many years of thought on the mater. I also hope that you found my IBM PC Code Page 437 to Unicode Mapping Table [10] to be useful. If you have any input on this work, please feel free to email me with your thoughts and I will try to reply. Thanks, Michael Walden Links and References ==================== [1] ^ Code page 437 - Wikipedia <https://en.Wikipedia.org/wiki/Code_page_437> Currently (as of 2025-04-20) Wikipedia's Code page 437 has 15 INCORRECT default mappings: 00 0000, 0D 266A, 0E 266B, 10 25BA, 11 25C4, 1C 221F, 27 0027, 7C 007C, E1 00DF, E6 00B5, E7 03C4, E9 0398, ED 03C6, EE 03B5, FA 00B7, FE 25A0 [2] ^ Unicode.org Mapping: IBM PC memory-mapped video graphics to Unicode <https://Unicode.org/Public/MAPPINGS/VENDORS/MISC/IBMGRAPH.TXT> IBMGRAPH.TXT has only characters 01...1F, 7F and 6 INCORRECT mappings: 00 ----, 0D 266A, 0E 266B, 10 25BA, 11 25C4, 1C 221F Use the above mapping in conjunction with one of the following two mappings to make a whole Code page 437 character set mapping. [3] ^ Unicode.org Mapping: CP437_DOSLatinUS to Unicode table (by Microsoft) <https://Unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT> CP437.TXT has 12 INCORRECT mappings: 00 0000, 27 0027, 7c 007c, 7f 007f, e1 00df, e6 00b5, e7 03c4, e9 0398, ed 03c6, ee 03b5, fa 00b7, fe 25a0 The following mapping is identical to the above one. Both are from Microsoft. [4] ^ Microsoft: OEM 437 <https://Web.Archive.org/web/20090116210205/http://www.microsoft.com/globaldev/reference/oem/437.mspx> OEM 437.mspx has 12 INCORRECT mappings: 00 0000, 27 0027, 7C 007C, 7F 007F, E1 00DF, E6 00B5, E7 03C4, E9 0398, ED 03C6, EE 03B5, FA 00B7, FE 25A0 [5] ^ CP437 (DOSLatinUS) by Roman Czyborra <http://Czyborra.com/charsets/codepages.html> <http://Czyborra.com/charsets/cp437.txt.gz> 12 INCORRECT mappings: 00 ----, 09 25E6, 0D 266A, 0E 266B, 1C 2310, 27 0027, 7C 007C, 7F 007F, E7 03C4, E9 0398, FA 00B7, FE 25A0 The above five Code page 437 Unicode mappings are all I have now, I did have others but unfortunately lost track of them. [6] ^ Dr. David J. Bradley on IBM PC's Character Set and More <https://MW.Rat.bz/djb/> [7] ^ IBM-PC-BIOS/PCBIOS.ASM at main · philspil66/IBM-PC-BIOS · GitHub <https://GitHub.com/philspil66/IBM-PC-BIOS/blob/main/PCBIOS.ASM#L5640> [8] ^ IBM PC 5150 Technical Reference 6361453 (### See address FE66 ###) <https://Archive.org/details/IBMPCIBM5150TechnicalReference6322507APR84/page/n202/mode/1up> [9] ^ Unicode XX.X Character Code Charts <https://Unicode.org/charts/> [10] ^ ^ IBM PC Code Page 437 to Unicode Mapping Table <https://MW.Rat.bz/cp437map> [11] ^ Note #1 - IBM PC Code Page 437 character ranges <https://MW.Rat.bz/djb/#N1> [12] ^ CP00437.txt IBM Personal Computer GCGID and GCGID Name table <https://Public.DHE.IBM.com/software/globalization/gcoc/attachments/CP00437.txt> [13] ^ CP00437.pdf IBM Corporate Specification C-H 3-3220-050 <https://Public.DHE.IBM.com/software/globalization/gcoc/attachments/CP00437.pdf> [14] ^ Unicode and Windows XP - Cathy Wissink - Program Manager, Windows Globalization - Microsoft Corporation - Dublin, Ireland, May 2002 <https://Web.Archive.org/web/20060223061715/http://download.microsoft.com/download/5/6/8/56803da0-e4a0-4796-a62c-ca920b73bb17/21-Unicode_WinXP.pdf> (See the string "The term "ANSI" as..." at the bottom of page 1) 21st. International Unicode Conference - Posted: May 2002 <https://Web.Archive.org/web/20050119065046/https://www.microsoft.com/globaldev/reference/presentations/21st_Unicode_Conf.mspx> (Has a link to "Unicode and Windows XP" which is the link [15] above) The Old New Thing - Why is the default 8-bit codepage called "ANSI"? <https://Web.Archive.org/web/20060211031914/http://blogs.msdn.com/oldnewthing/archive/2004/05/31/144893.aspx> (This web page also talks about the use of "ANSI" codepages) (Now The Old New Thing by Raymond Chen is at <https://devblogs.microsoft.com/oldnewthing/>) [15] ^ ANSI character set - Wikipedia <https://en.Wikipedia.org/wiki/ANSI_character_set> ------------------------------------------------------------------------------- (This document was originally published here: https://MW.Rat.bz/confusion ) Counter: 407 (Since 2025-04-26) -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- [Back] [Top] [Index] [Contact Michael Walden] [Validates as HTML 4.01 Transitional] -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-==-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- [EOF]