Extended ASCII Encoding in .NET

Internally in the .NET framework strings are stored with 16 bit Unicode characters but when text is transferred to or from a file an encoding conversion takes place.  Files typically hold text as 16 bit Unicode (UTF16) or UTF8 which uses 8 bits for normal characters and an extended 16 bit code (or more) for extended characters.   Older files hold a simple 7 bit ASCII character set.  The framework provides encoding as standard for all these types.

Several times I have come across files and other device streams that use ASCII but utilise 8 bits with codes from 128 to 255.  The standard encoder for ASCII only handles characters from 0 to 127 and although the UTF8 format can deal with codes 160 and above, the range from 128 to 159 is not covered.  Googling for assistance here is not particularly useful with most comments stating that ASCII is only defined for 7 bit codes.  The solution I eventually discovered was that this 8 bit ASCII is probably ISO-8859-1 (or Western European ISO).  The character set is actually extended to the Windows-1252 standard.   An encoding can be performed using code page 28591, with a C# code snippet:

Encoding enc = Encoding.GetEncoding(28591);
StreamWriter sw = new StreamWriter(”Test.txt”, false, enc);

 With this you can handle characters such as ñ at code 241; I needed this for the spanish Español.

One Response to “Extended ASCII Encoding in .NET”

Leave a Reply

You must be logged in to post a comment.