|
|
The java.io Package - Readers and Writers
InputStreamReader
OutputStreamWriter
Character Encoding
- Character encodings specify how 8-bit bytes are translated to 16-bit Unicode
- they are represented by Strings which follow the naming standards set by IANA Character Registry
- every implementation of Java is required to support the following sets:
US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the
Basic Latin block of the Unicode character
set
ISO-8859-1
ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
UTF-8
Eight-bit Unicode Transformation Format
UTF-16BE
Sixteen-bit Unicode Transformation Format,
big-endian byte order
UTF-16LE
Sixteen-bit Unicode Transformation Format,
little-endian byte order
UTF-16
Sixteen-bit Unicode Transformation Format,
byte order specified by a mandatory initial
byte-order mark (either order accepted on
input, big-endian used on output)
- specific platforms ie those used in Japan, China, Mid-East, etc, may include other encodings
- the streams are used to read and write data encoded in a character set which is different than the default system encoding
- For example (JPL pg238), to read bytes encoded under ISO 8859-6 for Arabic characters
public Reader readArabic(String file) throws IOException {
InputStream fileIn = new FileInputSgream(file);
return new InputStreamReader(fileIn, "iso-8859-6");
}
|