Streams

Previous theme in this lecture -- Keyboard shortcut: 'p'

Next slide in this lecture -- Keyboard shortcut: 'n'

Input and Output Classes

A complete PDF version of the text book is now available. The PDF version is an almost complete subset of the HTML version (where only a few, long program listings have been removed). See here.

37. Streams

We are now about to start the first chapter in the lecture about Input and Output (IO). Traditionally, IO deals with transfer of data to/from secondary storage, most notably disks. IO also covers the transmission of data to/from networks.

In this and the following chapters we will study the classes that are related to input and output. This includes file and directory classes. At the abstract level, the Stream class is the most important class in the IO landscape. Therefore we choose to start the IO story with an exploration of streams, and an understanding of the Stream class in C#. This includes several Stream subclasses and several client classes of Stream. The clients we have in mind are the so-called reader and writer classes.

37.1 The Stream Concept 37.9 Readers and Writers in C#
37.2 The abstract class Stream in C# 37.10 The class TextWriter
37.3 Subclasses of class Stream 37.11 The class TextReader
37.4 Example: Filestreams 37.12 The class BinaryWriter
37.5 The using control structure 37.13 The class BinaryReader
37.6 More FileStream Examples 37.14 The classes StringReader and StringWriter
37.7 The class Encoding 37.15 The Console class
37.8 Sample use of class Encoding

37.1. The Stream Concept
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

A stream is an abstract concept. A stream is a connection between a program and a storage/network. Essentially, we can read data from the stream into a program, or we can write data from a program to the stream. This understanding of a stream is illustrated in Figure 37.1.

Figure 37.1 Reading from and writing to a stream

A stream is a flow of data from a program to a backing store, or from a backing store to a program

The program can either write to a stream, or read from a stream.

Stream and stream processing includes the following:

Reading from or writing to files in secondary memory (disk)
Reading from or writing to primary memory (RAM)
Connection to the Internet
Socket connection between two programs

The second item (reading and writing to/from primary memory) seems to be special compared to the others. Sometimes it may be attractive to have files in primary memory, and therefore it is natural that we should be able to use stream operation to access such files as well. In other situations, we wish to use internal data structures as sources or destinations of streams. It is, for instance, typical that we wish to read and write data from/to strings. We will see how this can be done in Section 37.14.

37.2. The abstract class Stream in C#
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

The Stream class in C# is an abstract class (see Section 30.1). It belongs to the System.IO namespace, together with a lot other IO related types. The abstract Stream class provides a generic view on different kinds of sources and destinations, and it isolates client classes from the operating system details of these.

The Stream class supports both synchronous and asynchronous IO operations. Client classes that invoke a synchronous operation wait until the operation is completed before they can initiate other operations or actions. Use of a synchronous operation is not a problem if the operation is fast. Many IO operations on secondary storage are, however, very slow seen relative to the speed of the operations on primary storage. Therefore it may in some circumstances be attractive to initiate an IO operation, do something else, and consult the result of the IO operation at a later point in time. In order to provide for this, the Stream class supports the asynchronous IO operations BeginRead and BeginWrite. In the current version of the material we do not cover the asynchronous operations.

Let us now look at the most important operations on streams. The italic names refer to abstract methods. The abstract methods will be implemented in non-abstract subclasses of Stream.

int Read (byte[] buf, int pos, int len)
int ReadByte()
void Write (byte[] buf, int pos, int len)
void WriteByte(byte b)
bool CanRead
bool CanWrite
bool CanSeek
long Length
void Seek (long offset, SeekOrigin org)
void Flush ()
void Close()

In order to use Read you should allocate a byte array and pass (a reference to) this array as the first parameter of Read. The call Read(buf, p, lgt) reads at most lgt bytes, and stores them in buf[p] ... buf[p+lgt-1]. Read returns the actual number of characters read, which can be less than lgt.

Write works in a similar way. We assume that a number of bytes are stored in an existing byte array called buf. The call Write(buf, p, lgt) writes lgt bytes, buf[p] ... buf[p+lgt-1], to the stream.

As you can see, only ReadByte and WriteByte are non-abstract methods. ReadByte returns the integer value of the byte being read, or -1 in case that the end of the stream has bee encountered. The two operations ReadByte and WriteByte rely on Read and Write. Internally, ReadByte calls Read on a one-byte array, it accesses this byte, and it returns this byte. WriteByte works in a similar way. Based on these informations, it is not surprising that it is recommended to redefine ReadByte and WriteByte in specialized Stream classes. The default implementations of ReadByte and WriteByte are simply too inefficient. The redefinitions should be able to profit from internal buffering.

The explanations of Read in relation to ReadByte (and Write in relation to WriteByte) may seem a little surprising. Why not have ReadByte as an abstract method, and Read as a non-abstract method, which once and for all is implemented in class Stream by multiple calls of ReadByte? Such a design seems to be ideal: The task of implementing ReadByte in subclasses is easy, and no subclass should ever need to implement Read. The reason behind the actual design of the abstract Stream class is - of course - efficiency. The basic read and write primitives of streams should provide for efficient reading and writing. It is typically inefficient to read a single byte from a stream. On many types of hardware (such as harddisks) we always read a number of bytes at a time. The design of the read and write operations take advantage of this observation.

It is not possible to read, write, and seek in all streams. Therefore it is possible to query a stream for its actual capabilities. The boolean operations (properties) CanRead, CanWrite, CanSeek are used for such querying.

The static field Null represents a stream without a backing store.

Null is a public static field of type Stream in the abstract class Stream. If you, for some reason, wish to discard the data that you write, you can write it to Stream.Null. You can also read from Stream.Null; This will always give zero as result, however.

37.3. Subclasses of class Stream
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

The abstract class Stream is the superclass of a number of non-abstract classes. Below we list the most important of these. Like the class Stream, many of the subclasses of Stream belong to the System.IO namespace.

System.IO.FileStream
Provides a stream backed by a file from the operating system
System.IO.BufferedStream
Encapsulates buffering around another stream
System.IO.MemoryStream
Provides a stream backed by RAM memory
System.Net.Sockets.NetworkStream
Encapsulates a socket connection as a stream
System.IO.Compression.GZipStream
Provides stream access to compressed data
System.Security.Cryptography.CryptoStream
Write encrypts and Read decrypts
And others...

We show example uses of class FileStream in Section 37.4 and Section 37.6. Please notice, however, that file IO is typically handled through one of the reader and writer classes, which behind the scene delegates the work to a Stream class. We have a lot more to say about the reader and writer classes later in this material. Section 37.9 will supply you with an overview of the reader and writer classes in C#.

The class BufferedStream is intended to be used as a so-called decorator of another stream class. In Section 40.1 we discuss the Decorator design pattern. The concrete example of Decorator, which we will discuss in Section 40.2, involves compressed streams. Notice that it is not relevant to use buffering on FileStream, because it natively makes use of buffering.

37.4. Example: Filestreams
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

FileStream IO, as illustrated by the examples in this section, is used for binary input and output. It means that the FileStream operations transfer raw chuncks of bits between the program and the file. The bits are not interpreted. As a contrast, the reader and writer classes introduced in Section 37.9 interpret and transform the raw binary data to values in C# types.

Let us show a couple of very simple programs that write to and read from filestreams. Figure 37.1 writes bytes corresponding to the three ASCII characters 'O', 'O', and 'P' to the file myFile.bin. Notice that we do not write characters, but numbers that belong to the simple type byte. The file opening is done via construction of the FileStream object in Create mode. Create is a value in the enumeration type FileMode in the namespace System.IO. The other possible FileMode values are CreateNew, Create, Open, OpenOrCreate, Truncate, and Append. There exists many different overloads of the FileStream constructors, which in addition to FileMode also involves a FileAccess enumeration type (with values Read, Write, and ReadWrite). File closing is done by the Close method.

using System.IO;

class ReadProg {
  static void Main() {
    Stream s = new FileStream("myFile.bin", FileMode.Create);
    s.WriteByte(79);  // O    01001111
    s.WriteByte(79);  // O    01001111
    s.WriteByte(80);  // P    01010000
    s.Close();
  }
}

Program 37.1 A program that writes bytes corresponding to 'O' 'O' 'P' to a file stream.

After having executed the program in Figure 37.1 the file myFile.bin exists. Program 37.2 reads it. We create a FileStream object in Open mode, and we read the individual bytes with use of the ReadByte method. In line 11 and 12 we illustrate what happens if we read beyond the end of the file. We see that ReadByte in that case returns -1. The number -1 is not a value in type byte, which supports the range 0..255. Therefore the type of the value returned by ReadByte is int.

using System;
using System.IO;

class WriteProg {
  static void Main() {
    Stream s = new FileStream("myFile.bin", FileMode.Open);
    int i, j, k, m, n;
    i = s.ReadByte();  // O   79  01001111
    j = s.ReadByte();  // O   79  01001111
    k = s.ReadByte();  // P   80  01010000
    m = s.ReadByte();  // -1  EOF
    n = s.ReadByte();  // -1  EOF

    Console.WriteLine("{0} {1} {2} {3} {4}", i, j, k, m, n);
    s.Close();
  }
}

Program 37.2 A program that reads the written file.

37.5. The using control structure
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

The simple file reading and writing examples in Section 37.4 show that file opening (in terms of creating the FileStream object) and file closing (in terms of sending a Close message to the stream) appear in pairs. This inspires a new control structure which ensures that the file always is closed when we are done with it. The syntax of the using construct is explained below.



using (type variable = initializer)
  body

Syntax 37.1 The syntax of the using statement C#

The meaning (semantics) of the using construct is the following:

In the scope of using, bind variable to the value of initializer
The type must implement the interface IDisposable
Execute body with the established name binding
At the end of body do variable.Dispose
The Dispose methods in the subclasses of Stream call Close

We encountered the interface IDisposable when we studied the interfaces in the C# libraries, see Section 31.4. The interface IDisposable prescribes a single method, Dispose, which in general is supposed to release resources. The abstract class Stream implements IDisposable, and the Dispose method of class Stream calls the Stream Close method.

Program 37.3 is a reimplementation of Program 37.1 that illustrates the using construct. Notice that we do not explicitly call Close in Program 37.3.

using System.IO;

class ReadProg {
  static void Main() {
    using(Stream s = new FileStream("myFile.txt", FileMode.Create)){
      s.WriteByte(79);  // O   01001111
      s.WriteByte(79);  // O   01001111
      s.WriteByte(80);  // P   01010000
    }
  }
}

Program 37.3 The simple write-program programmed with 'using'.

The following fragment shows what is actually covered by a using construct. Most important, a try-finally construct is involved, see Section 36.9. The use of try-finally implies that Dispose will be called independent of the way we leave body. Even if we attempt to exit body with a jump or via an exception, Dispose will be called.

// The using statement ...

  using (type variable = initializer)
    body

// ... is equivalent to the following try-finally statement

  {type variable = initializer;
    try {
      body
    }
    finally {
      if (variable != null) 
         ((IDisposable)variable).Dispose();
    }
  }

Program 37.4 The control structure 'using' defined by 'try-finally'.

37.6. More FileStream Examples
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

We will show yet another simple example of FileStreams, namely a static method that copies one file to another. The example is similar to Program 34.1 which we used to illustrate responsibility and exceptions in Section 34.2.

using System;
using System.IO;

public class CopyApp {

  public static void Main(string[] args) {
    FileCopy(args[0], args[1]);
  }
 
  public static void FileCopy(string fromFile, string toFile){
    try{
      using(FileStream fromStream = 
                         new FileStream(fromFile, FileMode.Open)){
        using(FileStream toStream  = 
                         new FileStream(toFile, FileMode.Create)){
          int c;
      
          do{
            c = fromStream.ReadByte();
            if(c != -1) toStream.WriteByte((byte)c);
          } while (c != -1);
        }
      }
    }
    catch(FileNotFoundException e){
      Console.WriteLine("File {0} not found: ", e.FileName);
      throw;
    }
    catch(Exception){
      Console.WriteLine("Other file copy exception");
      throw;
    }
 }

}

Program 37.5 A FileCopy method in a source file copy-file.cs - uses two FileStreams.

Notice how the args string array of Main is used for passing input and output file names to the program.

Exercise 10.1. A variant of the file copy program

The purpose of this exercise is to train the use of the Read method in class Stream, and subclasses of class Stream.

Write a variant of the file copy program. Your program should copy the entire file into a byte array. Instead of the method ReadByte you should use the Read method, which reads a number of bytes into a byte array. (Please take a careful look at the documentation of Read in class FileStream before you proceed). After this, write out the byte array to standard output such that you can assure yourself that the file is read correctly.

Are you able to read the entire file with a single call to Read? Or do you prefer to read chunks of a certain (maximum) size?

Solution

37.7. The class Encoding
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

Before we study the reader and writer classes we will clarify one important topic, namely encodings.

The problem is that a byte (as represented by a value of type byte) and a character (as represented as value of type char) are two different things. In the old days they were basically the same, or it was at least straightforward to convert one to the other. In old days there were at most 256 different characters available at a given point in time (corresponding to a straightforward encoding of a single character in a single byte). Today, the datatype char should be able to represent a wide variety of different characters that belong to different alphabets in different cultures. We still need to represent a character by means of a number of bytes, because a byte is a fundamental unit in most software, and in most digital hardware.

As a naive approach, we could go for the following solution:

We want to be able to represent a maximum of, say, 200000 different characters. For this purpose we need log₂(200000) bits, which is 18 bits. If we operate in units if 8 bits (= one byte) we see that we need at least 3 bytes per characters. Most likely, we will go for 4 bytes per character, because it fits much better with the word length of most computers. Thus, the byte size of a text will now be four times the size of an ASCII text. This is not acceptable because it would bloat the representation of text files on secondary disk storage.

As of 2007, the Unicode standard defines more than 100000 different characters. Unicode organizes characters in a number of planes of up to 2¹⁶ (= 65536) characters. The Basic Multilingual Plane - BMP - contains the most common characters.

Encodings are invented to solve the problem that we have outlined above. An encoding is a mapping between values of type character (a code point number between 0 and 200000 in our case) to a sequence of bytes. The naive approach outlined above represents a simple encoding, in which we need 4 bytes even for the original ASCII characters. It is attractive, however, if characters in the original, 7-bit ASCII alphabet can be encoded in a single byte. The price of that may very well be that some rarely used characters will need considerable more bytes for their encoding.

Let us remind ourselves that in C#, the type char is represented as 16 bit entities (Unicode characters) and that a string is a sequence of values of type char. We have already touched on this in Section 6.1. At the time Unicode was designed, it was hypothesized that 16 bits was enough to to represent all characters in the world. As mentione above, this turned out not to be true. Therefore the type char in C# is not big enough to hold all Unicode characters. The remedy is to use multiple char values for representation of a single Unicode character. We see that history repeats itself...

An encoding is a mapping between characters/strings and byte arrays

An object of class System.Text.Encoding represents knowledge about a particular character encoding

Let us now review the operations in class Encoding, which is located in the namespace System.Text :

byte[] GetBytes(string) Instance method
byte[] GetBytes(char[]) Instance method
Encodes a string/char array to a byte array relative to the current encoding
char[] GetChars(byte[]) Instance method
Decodes a byte array to a char array relative to the current encoding
byte[] Convert(Encoding, Encoding, byte[]) Static method
Converts a byte array from one encoding (first parameter) to another encoding (second parameter)

The method GetBytes implements the encoding in the direction of characters to byte sequences. In concrete terms, the method GetBytes transforms a String or an array of chars to a byte array.

The inverse method, GetChars converts an array of bytes to the corresponding array of characters. On a given string str and for a given encoding e e.GetChars(e.GetBytes(str)) corresponds to str.

For given encodings e1 and e2, and for some given byte array ba supposed to be encoded in e1, Convert(e1,e2,ba) is equivalent to e2.GetBytes(e1.GetChars(ba)).

37.8. Sample use of class Encoding
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

Now that we understand the idea behind encodings, let us play a little with them. In Program 37.6 we make a number different encodings, and we convert a given string to some of these encodings. We explain the details after the program.

using System;
using System.Text;

/* Adapted from an example provided by Microsoft */
class ConvertExampleClass{
  public static void Main(){
    string unicodeStr =          // "A æ u å æ ø i æ å"
        "A \u00E6 u \u00E5 \u00E6 \u00F8 i \u00E6 \u00E5";    

    // Different encodings.
    Encoding ascii = Encoding.ASCII,                          
             unicode = Encoding.Unicode,
             utf8 = Encoding.UTF8,
             isoLatin1 = Encoding.GetEncoding("iso-8859-1");  

    // Encodes the characters in a string to a byte array:
    byte[] unicodeBytes = unicode.GetBytes(unicodeStr),       
           asciiBytes =   ascii.GetBytes(unicodeStr),         
           utf8Bytes =   utf8.GetBytes(unicodeStr),
           isoLatin1Bytes =   utf8.GetBytes(unicodeStr);

    // Convert from byte array in unicode to byte array in utf8:
    byte[] utf8BytesFromUnicode =                             
      Encoding.Convert(unicode, utf8, unicodeBytes);          
                                                              
    // Convert from byte array in utf8 to byte array in ascii:
    byte[] asciiBytesFromUtf8 =                               
      Encoding.Convert(utf8, ascii, utf8Bytes);
            
    // Decodes the bytes in byte arrays to a char array:
    char[] utf8Chars = utf8.GetChars(utf8BytesFromUnicode);    
    char[] asciiChars = ascii.GetChars(asciiBytesFromUtf8);    

    // Convert char[] to string:
    string utf8String = new string(utf8Chars),                 
           asciiString = new String(asciiChars);               

    // Display the strings created before and after the conversion.
    Console.WriteLine("Original string: {0}", unicodeStr);     
    Console.WriteLine("String via UTF-8: {0}", utf8String);    
                                                               
    Console.WriteLine("Original string: {0}", unicodeStr);               
    Console.WriteLine("ASCII converted string: {0}", asciiString);       
  }                                                                      
}

Program 37.6 Sample encodings, conversions, and decodings of a string of Danish characters.

In line 7 we declare a sample string, unicodeStr, which we initialize to a string with plenty of national Danish characters. We notate the string with escape notation \udddd where d is a hexadecimal digit. We could, as well, have used the string constant in the comment at the end of line 7.

In line 11-14 we make a number of instances of class Encoding. Some common Encoding objects can be accessed conveniently via static properties of class Encoding. The UTF-8 encoding can in that way be accessed with Encoding.UTF8. The static method GetEncoding accesses an encoding via the name of the encoding. (In order to get access to all supported encodings, the static method GetEncodings (plural) is useful). The ISO Latin 1 encoding is accessed via use with use of GetEncoding in line 14. There also exists a number of specialized Encoding classes, such as UnicodeEncoding (UTF-16), and UTF8Encoding in the the System.Text namespace.

In line 17-20 we convert the string unicodeStr to byte arrays in different encodings. For this purpose we use the instance method GetBytes.

Next, in line 22-28, we show how to use the static method Convert to convert a byte array in one encoding to a byte array in another encoding.

In line 30-32 it is shown how to convert byte arrays in a particular encoding to a char array. It is done by the instance method GetChars. We most probably wish to obtain a string instead of a char array. For that purpose we just use an appropriate String constructor, as shown in line 34-36.

In line 38-43 we display the values of utf8String and asciiString, and for comparison we also print the original unicodeStr. The printed result is shown in Program 37.7. It is not surprising that the national Danish characters cannot be represented in the ASCII character set. The Danish characters are (ambiguously) translated to '?'.

Original string: A æ u å æ ø i æ å
String via UTF-8: A æ u å æ ø i æ å
Original string: A æ u å æ ø i æ å
ASCII converted string: A ? u ? ? ? i ? ?

Program 37.7 Output from the Encoding program.

Exercise 10.2. Finding the encoding of a given text file

Make a UTF-8 text file with some words in Danish. Be sure to use plenty of special Danish characters. You may consider to write a simple C# program to create the file. You may also create the text file in another way.

In this exercise you should avoid writing a byte order mark (BOM) in your UTF-8 text file. (A BOM in the UTF-8 text file may short circuit the decoding we are asking for later in the exercise). One way to avoid the BOM is to denote the UTF-8 encoding with new UTF8Encoding(), or equivalently new UTF8Encoding(false). You may want to consult the constructors in class UFT8Encoding for more information.

Now write a C# program which systematically - in a loop - reads the text file six times with the following objects of type Encoding: ISO-8859-1, UTF-7, UTF-8, UTF-16 (Unicode), UTF32, and 7 bits ASCII.

More concretely, I suggest you make a list of six encoding objects. For each encoding, open a TextReader and read the entire file (with ReadToEnd, for instance) with the current encoding. Echo the characters, which you read, to standard output.

You should be able to recognize the correct, matching encoding (UTF-8) when you see it.

Solution

37.9. Readers and Writers in C#
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

In the rest of this chapter we will explore a family of so-called reader and writer classes. In most practical cases one or more of these classes are used for IO purposes instead of a Stream subclass, see Section 37.2.

Table 37.1 provides an overview of the reader and writer classes. In the horizontal dimension we have input (readers) and output (writers). In the vertical dimension we distinguish between text (char/string) IO and binary (bits structured as bytes) IO.

	Input	Output
Text	`TextReader` `StreamReader` `StringReader`	`TextWriter` `StreamWriter` `StringWriter`
Binary	`BinaryReader`	`BinaryWriter`

Table 37.1 An overview of Reader and Writer classes

The class Stream and its subclasses are oriented towards input and output of bytes. In contrast, the reader and writer classes are able to deal with input and output of characters (values of type char) and values of other simple types. Thus, the reader and writer classes operate at a higher level of abstraction than the stream classes.

In Section 37.3 we listed some important subclasses of class Stream. We will now discuss how the reader and writer classes in Table 37.1 are related to the stream classes. None of the classes in Table 37.1 inherit from class Stream. Rather, they delegate part of their work to a Stream class. Thus, the reader and writer classes aggregate (have a) Stream class together with other pieces of data. The class StreamReader, StreamWriter, BinaryReader, and BinaryWriter all have constructors that take a Stream class as parameter. In that way, it is possible to build such readers and writes on a Stream class.

TextReader and TextWriter in Table 37.1 are abstract classes. Their subclasses StringReader and StringWriter are build on strings rather than on streams. We have more to say about StringReader and StringWriter in Section 37.14.

In the following sections we will rather systematically describe the reader and writer classes in Table 37.1, and we will show examples of their use.

37.10. The class TextWriter
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

In this section we discuss the abstract class TextWriter, and not least its non-abstract subclass StreamWriter. We cover the sibling classes StringWriter and StringReader in Section 37.14.

Most important, class TextWriter supports writing of text - characters and strings - via a chosen encoding. Encodings were discussed in Section 37.7. With use of class TextWriter it is also possible to write textual representations of simple types, such as int and double.

We illustrate the use of class StreamWriter in Program 37.8. Recall from Table 37.1 that StreamWriter is a non-abstract subclass of class TextWriter.

In Program 37.8 we write str and strEquiv (in line 9-10) to three different files. Both strings are identical, they contain a lot of Danish letters, but they are notated differently. It is the same string that we used in Program 37.6 for illustration of encodings. For each of the files we use a particular encoding (see Section 37.7). Notice that we in line 12, 16 and 20 use a StreamWriter constructor that takes a Stream and an encoding as parameters. There a six other constructors to chose from (see below). In line 24-26 we write the two strings to each of the three files. Try out the program, and read the three text files with your favorite text editor. Depending of the capabilities of your editor, you may or may not be able to read them all.

using System;
using System.IO;
using System.Text;

public class TextWriterProg{

  public static void Main(){
    string str =      "A æ u å æ ø i æ å",                                   
           strEquiv = "A \u00E6 u \u00E5 \u00E6 \u00F8 i \u00E6 \u00E5";
  
    TextWriter                                                               
      tw1 = new StreamWriter(                         // Iso-Latin-1
             new FileStream("f-iso.txt", FileMode.Create),
             Encoding.GetEncoding("iso-8859-1")),                            

      tw2 = new StreamWriter(                         // UTF-8
             new FileStream("f-utf8.txt", FileMode.Create),
             new UTF8Encoding()),                                            

      tw3 = new StreamWriter(                         // UTF-16              
             new FileStream("f-utf16.txt", FileMode.Create),                 
             new UnicodeEncoding());                                         
                                                                             
    tw1.WriteLine(str);     tw1.WriteLine(strEquiv);                         
    tw2.WriteLine(str);     tw2.WriteLine(strEquiv);                         
    tw3.WriteLine(str);     tw3.WriteLine(strEquiv);

    tw1.Close();                                                             
    tw2.Close();
    tw3.Close();
  }

}

Program 37.8 Writing a text string using three different encodings with StreamWriters.

You may wonder if knowledge about the applied encoding is somehow represented in the text file. The first few bytes in a text file created from a TextWriter may contain some information about the encoding. StreamWriter calls Encoding.GetPreamble() in order to get a byte array that represents knowledge about the encoding. This byte array is written in the beginning of the text file. This preamble is primarily used to determine the byte order of UTF-16 and UTF-32 encodings. (Two different byte orders are widely used on computers from different CPU manufacturers: Big-endian (most significant byte first) and little-endian (least significant byte first)). The preambles of the ASCII and the ISO Latin 1 encodings are empty.

The next program, shown in Program 37.9, first creates a StreamWriter on a given file path (a text string) "simple-types.txt". The default encoding is used. (The default encoding is system/culture dependent. It can be accessed with the static property Encoding.Default). By use of the heavily overloaded Write method it writes an integer, a double, a decimal, and a boolean to the file.

Next, from line 15-18, it writes a Point and a Die to a text file named "non-simple-types.txt". As expected, the ToString method is used on the Point and the Die objects. The contents of the two text files are shown in Listing 37.10 (only on web) and Listing 37.11 (only on web).

using System;
using System.IO;

public class TextSimpleTypes{
  
  public static void Main(){
 
    using(TextWriter tw = new StreamWriter("simple-types.txt")){   
      tw.Write(5);  tw.WriteLine();                                
      tw.Write(5.5);  tw.WriteLine();                              
      tw.Write(5555M); tw.WriteLine();
      tw.Write(5==6); tw.WriteLine();
    }

    using(TextWriter twnst = new StreamWriter("non-simple-types.txt")){   
      twnst.Write(new Point(1,2)); twnst.WriteLine();                     
      twnst.Write(new Die(6)); twnst.WriteLine();                         
    }

  }
}

Program 37.9 Writing values of simple types and objects of our own classes.

5
5,5
5555
False

Listing 37.10 The file simple-types.txt.

1
2

Point: (1, 2). 
Die[6]: 3

Listing 37.11 The file non-simple-types.txt.

The following items summarize the operations in class StreamWriter:

7 overloaded constructors
Parameters involved: File name, stream, encoding, buffer size
StreamWriter(String)
StreamWriter(Stream)
StreamWriter(Stream, Encoding)
others
17/18 overloaded Write / WriteLine operations
Chars, strings, simple types. Formatted output
Encoding
A property that gets the encoding used for this TextWriter
NewLine
A property that gets/sets the applied newline string of this TextWriter
others

Exercise 10.3. Die tossing - writing to text file

Write a program that tosses a Die 1000 times, and writes the outcome of the tosses to a textfile. Use a TextWriter to accomplish the task.

Write another program that reads the text file. Report the number of ones, twos, threes, fours, fives, and sixes.

Solution

37.11. The class TextReader
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

The class TextReader is an abstract class of which StreamReader is a non-abstract subclass. StreamReader is able to read characters from a byte stream relative to a given encoding. In most respects, the class TextReader is symmetric to class TextWriter. However, there are no Read counterparts to all the overloaded Write methods in TextWriter. We will come back to this observation below.

Program 37.12 is a program that reads the text that was produced by Program 37.8. In Program 37.12 we create three TextReader object. They are all based on file stream objects and encodings similar to the ones used in Program 37.8. From each TextReader we read the two strings that we wrote in Program 37.8. It is hardly surprising that we get six instances of the strange string "A æ u å æ ø i æ å". In line 19-21 they are all written to standard output via use of Console.WriteLine.

The last half part of Program 37.12 (from line 27) reads the three files as binary information (as raw bytes). The purpose of this reading is to exercise the actual contents of the three files. This is done by opening each of the files via FileStream objects, see Section 37.4. Recall that FileStream allows for binary reading (in terms of bytes) of a file. The function StreamReport (line 39-49) reads each byte of a given FileStream, and it prints these bytes on the console. The output in Listing 37.13 reveals - as expected - substantial differences between the actual, binary contents of the three files. Notice that the ISO Latin 1 file is the shortest, the UTF-8 file is in between, and the UTF-16 file is the longest.

using System;
using System.IO;
using System.Text;

public class TextReaderProg{

  public static void Main(){
  
    TextReader tr1 = new StreamReader(
                       new FileStream("f-iso.txt", FileMode.Open),
                       Encoding.GetEncoding("iso-8859-1")),
               tr2 = new StreamReader(
                       new FileStream("f-utf8.txt", FileMode.Open),
                       new UTF8Encoding()),
               tr3 = new StreamReader(             // UTF-16
                       new FileStream("f-utf16.txt", FileMode.Open),
                       new UnicodeEncoding());

    Console.WriteLine(tr1.ReadLine());  Console.WriteLine(tr1.ReadLine());  
    Console.WriteLine(tr2.ReadLine());  Console.WriteLine(tr2.ReadLine());  
    Console.WriteLine(tr3.ReadLine());  Console.WriteLine(tr3.ReadLine());  

    tr1.Close();
    tr2.Close();
    tr3.Close();

    // Raw reading of the files to control the contents at byte level
    FileStream  fs1 = new FileStream("f-iso.txt", FileMode.Open),
                fs2 = new FileStream("f-utf8.txt", FileMode.Open),
                fs3 = new FileStream("f-utf16.txt", FileMode.Open);

    StreamReport(fs1, "Iso Latin 1");   
    StreamReport(fs2, "UTF-8");
    StreamReport(fs3, "UTF-16");

    fs1.Close();
    fs2.Close();
    fs3.Close();
  }

  public static void StreamReport(FileStream fs, string encoding){
    Console.WriteLine();
    Console.WriteLine(encoding);
    int ch, i = 0;
    do{
      ch = fs.ReadByte();
      if (ch != -1) Console.Write("{0,4}", ch);
      i++;
      if (i%10 == 0) Console.WriteLine();
    } while (ch != -1);
    Console.WriteLine();
  }

}

Program 37.12 Reading back the text strings encoded in three different ways, with StreamReader.

A æ u å æ ø i æ å
A æ u å æ ø i æ å
A æ u å æ ø i æ å
A æ u å æ ø i æ å
A æ u å æ ø i æ å
A æ u å æ ø i æ å

Iso Latin 1
  65  32 230  32 117  32 229  32 230  32
 248  32 105  32 230  32 229  13  10  65
  32 230  32 117  32 229  32 230  32 248
  32 105  32 230  32 229  13  10

UTF-8
  65  32 195 166  32 117  32 195 165  32
 195 166  32 195 184  32 105  32 195 166
  32 195 165  13  10  65  32 195 166  32
 117  32 195 165  32 195 166  32 195 184
  32 105  32 195 166  32 195 165  13  10


UTF-16
 255 254  65   0  32   0 230   0  32   0
 117   0  32   0 229   0  32   0 230   0
  32   0 248   0  32   0 105   0  32   0
 230   0  32   0 229   0  13   0  10   0
  65   0  32   0 230   0  32   0 117   0
  32   0 229   0  32   0 230   0  32   0
 248   0  32   0 105   0  32   0 230   0
  32   0 229   0  13   0  10   0

Listing 37.13 Output from the program that reads back the strings encoded in three different ways.

Below, in Program 37.14, we show a program that reads the values from the file "simple-types.txt", as written by Program 37.9. Notice that we read a line at a time using the ReadLine method of StreamReader. ReadLine returns a string, which we parse by the static Parse methods in the structs Int32, Double, Decimal, and Boolean respectively. There are no dedicated methods in class StreamReader for reading the textual representations of integers, doubles, decimals, booleans, etc. The output of Program 37.14 is shown in Listing 37.15 (only on web).

using System;
using System.IO;

public class TextSimpleTypes{
  
  public static void Main(){
 
    using(TextReader twst = new StreamReader("simple-types.txt")){
      int i = Int32.Parse(twst.ReadLine()); 
      double d = Double.Parse(twst.ReadLine());  
      decimal m = Decimal.Parse(twst.ReadLine());  
      bool b = Boolean.Parse(twst.ReadLine());

      Console.WriteLine("{0} \n{1} \n{2} \n{3}", i, d, m, b);
    }

  }
}

Program 37.14 A program that reads line of text and parses them to values of simple types.

5 
5,5 
5555 
False

Listing 37.15 Output from the readline and parsing program.

As we did for class TextWriter in Section 37.10 we summarize the operations in class TextReader below. More concretely, we summarize the operations in the non-abstract subclass StreamReader:

10 StreamReader constructors
Similar to the StreamWriter constructors
StreamReader(String)
StreamReader(Stream)
StreamReader(Stream, bool)
StreamReader(Stream, Encoding)
others
int Read() Reads a single character. Returns -1 if at end of file
int Read(char[], int, int) Returns the number of characters read
int Peek()
String ReadLine()
String ReadToEnd()
CurrentEncoding
A property that gets the encoding of this StreamReader

The method Read reads a single character; It returns -1 if the file is positioned at the end of the file. The Read method that accepts three parameters is similar to the Stream method of the same name, see Section 37.2. As such, it reads a number of characters into an already allocated char array (which is passed as the first parameter of Read). Peek reads the next available character without advancing the file position. You can use the method to look a little ahead of the actual reading. As we have seen, ReadLine reads characters until an end of line character is encountered. Similarly, ReadToEnd reads the rest of stream - from the current position until the end of the file - and returns it as a string. ReadToEnd is often convenient if you wish to get access to a text file as a (potentially large) text string.

37.12. The class BinaryWriter
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

In this section we will study a writer class which produces binary data. As such, a binary writer is similar to a FileStream used in write access mode, see Section 37.4. The justification of BinaryWriter is, however, that it supports a heavily overloaded Write method just like the class TextWriter did. The Write methods can be applied on most simple data types. The Write methods of BinaryWriter produce binary data, not characters.

Encodings, see Section 37.7, played important roles for TextReader and TextWriter. Encodings only play a minimal role in BinaryWriter; Encodings are only used when we write characters to the binary file.

Below, in Program 37.16 we show a program similar to Program 37.9. We write four values of different simple types to a file with use of a BinaryWriter. In comments of the program we show the expected number of bytes to be written. With use of a FileInfo object (see Section 38.1) we check our expectations in line 18-19. The output of the program is 29, as expected.

using System;
using System.IO;

public class BinaryWriteSimpleTypes{
  
  public static void Main(){
    string fn = "simple-types.bin";

    using(BinaryWriter bw = 
            new BinaryWriter(
              new FileStream(fn, FileMode.Create))){
      bw.Write(5);      // 4  bytes
      bw.Write(5.5);    // 8  bytes
      bw.Write(5555M);  // 16 bytes
      bw.Write(5==6);   // 1  bytes
    }

    FileInfo fi = new FileInfo(fn);
    Console.WriteLine("Length of {0}: {1}", fn, fi.Length);

  }
}

Program 37.16 Use of a BinaryWriter to write some values of simple types.

The following operations are supplied by BinaryWriter:

Two public constructors
BinaryWriter(Stream)
BinaryWriter(Stream, Encoding)
18 overloaded Write operations
One for each simple type
Write(char), Write(char[]), and Write(char[], int, int) - use Encoding
Write(string) - use Encoding
Write(byte[]) and Write(byte[], int, int)
Seek(int offset, SeekOrigin origin)
others

The second constructor allows for registration of an encoding, which is used if we write characters as binary data. The Write methods, which accepts an array as first parameter together with two integers as second and third parameters, write a section of the involved arrays.

Exercise 10.4. Die tossing - writing to a binary file

This exercise is a variant of the die tossing and file writing exercise based on text files.

Modify the program to use a BinaryWriter and a BinaryReader.

Take notice of the different sizes of the text file from the previous exercise and the binary file from this exercise. Explain your observations.

Solution

37.13. The class BinaryReader
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

The class BinaryReader is the natural counterpart to BinaryWriter. Both of them deal with input from and output to binary data (in contrast to text in some given encoding).

The following program reads the binary file produced by Program 37.16. It produces the expected output, see Program 37.16 (only on web).

using System;
using System.IO;

public class BinaryReadSimpleTypes{
  
  public static void Main(){
    string fn = "simple-types.bin";

    using(BinaryReader br = 
            new BinaryReader(
              new FileStream(fn, FileMode.Open))){

      int i = br.ReadInt32(); 
      double d = br.ReadDouble();  
      decimal dm = br.ReadDecimal(); 
      bool b = br.ReadBoolean();

      Console.WriteLine("Integer i: {0}", i);
      Console.WriteLine("Double d: {0}", d);
      Console.WriteLine("Decimal dm: {0}", dm);
      Console.WriteLine("Boolean b: {0}", b);
    }

  }
}

Program 37.17 Use of a BinaryReader to write the values written by means of the BinaryWriter.

Integer i: 5
Double d: 5,5
Decimal dm: 5555
Boolean b: False

Listing 37.18 Output from the BinaryReader program.

The following gives an overview of the operations in the class BinaryReader:

Two public constructors
BinaryReader(Stream)
BinaryReader(Stream, Encoding)
15 individually name Readtype operations
ReadBoolean, ReadChar, ReadByte, ReadDouble, ReadDecimal, ReadInt16, ...
Three overloaded Read operations
Read() and Read (char[] buffer, int index, int count)
read characters - using Encoding
Read (bytes[] buffer, int index, int count) reads bytes

The most noteworthy observation is that there exist a large number of specifically named operations (such as ReadInt32 and ReadDouble) through which it is possible to read the binary representations of values in simple types.

37.14. The classes StringReader and StringWriter
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

StringReader is a non-abstract subclass of TextReader. Similarly, StringWriter is a non-abstract subclass of TextWriter. Table 37.1 gives you an overview of these classes.

The idea of StringReader is to use traditional stream/file input operations for string access, and to use traditional stream/file output operations for string mutation. Thus, relative to Figure 37.1 the source and destinations of reading and writing will be strings.

A StringReader can be constructed on a string. A StringWriter, however, cannot be constructed on a string, because strings are non-mutable in C#, see Section 6.4. Therefore a StringWriter object is constructed on an instance of StringBuilder.

In Program 37.19 we illustrate, in concrete terms, how to make a StringWriter on the StringBuilder referred by the variable sb (see line 9). In line 11-17 we iterate five times through the for loop, with increasing integer values in the variable i. In total, the textual representations of 20 simple values are written to the StringBuilder object. The content of the StringBuilder object is printed in line 19. The output of Program 37.19 is shown in Program 37.20 (only on web).

using System;
using System.IO;
using System.Text;

public class TextSimpleTypes{
  
  public static void Main(){

    StringBuilder sb = new StringBuilder();   // A mutable string
 
    using(TextWriter tw = new StringWriter(sb)){
      for (int i = 0; i < 5; i++){
        tw.Write(5 * i);  tw.WriteLine();
        tw.Write(5.5 * i);  tw.WriteLine();
        tw.Write(5555M * i); tw.WriteLine();
        tw.Write(5 * i == 6); tw.WriteLine();}
    }

    Console.WriteLine(sb);

  }
}

Program 37.19 A StringWriter program similar to the StreamReader program shown earlier.

0
0
0
False
5
5,5
5555
False
10
11
11110
False
15
16,5
16665
False
20
22
22220
False

Program 37.20 Output of the StringWriter program.

Symmetrically, we illustrate how to read from a string. In Program 37.21 we make a string str with broken lines in line 8-11. With use of a StringReader built on str we read an integer, a double, a decimal, and a boolean value. The output is shown in Program 37.22 (only on web).

using System;
using System.IO;

public class TextSimpleTypes{
  
  public static void Main(){
 
    string str = "5" + "\n" + 
                 "5,5" + "\n" +
                 "5555,0" + "\n" +
                 "false";

    using(TextReader tr = new StringReader(str)){
      int i = Int32.Parse(tr.ReadLine()); 
      double d = Double.Parse(tr.ReadLine());  
      decimal m = Decimal.Parse(tr.ReadLine());  
      bool b = Boolean.Parse(tr.ReadLine());

      Console.WriteLine("{0} \n{1} \n{2} \n{3}", i, d, m, b);
    }

  }
}

Program 37.21 A StringReader program.

5 
5,5 
5555,0 
False

Program 37.22 Output of the StringReader program.

The use of StringWriter and StringReader objects for accessing the characters in strings is an attractive alternative to use of the native String and StringBuilder operations. It is, in particular, attractive and convenient that we can switch from a file source/destination to a string source/destination. In that way existing file manipulation programs may be used directly as string manipulation programs. The only necessary modification of the program is a replacement of a StreamReader with StringReader, or a replacement of StreamWriter with a StringWriter.

Be sure to use the abstract classes TextReader and TextWriter as much as possible. You should only use StreamReader/StringReader and StreamWriter/StringWriter for instantiation purposes in the context of a constructor (such as line 11 of Program 37.19 and line 13 of Program 37.21).

37.15. The Console class
Contents Up Previous Next Slide Annotated slide Aggregated slides Subject index Program index Exercise index

We have used static methods in the Console class in almost all our programs. It is now time to examine the Console class a little closer. In contrast to most other IO related classes, the Console class resides in the System namespace, and not in System.IO. The Console class encapsulates three streams: standard input, standard output, and standard error. The static property In, of type TextReader, represents standard input. The static properties Out and Error represent standard output and standard error respectively, and they are both of type TextWriter. Recall in this context that TextReader and TextWriter are both abstract classes, see Section 37.9.

using System;
using System.IO;

class App{

  public static void Main(string[] args){

     TextWriter standardOutput = Console.Out;
     StreamWriter myOut = null,
                  myError = null;

     if (args.Length == 2) {
        Console.Out.WriteLine("Redirecting std output and error to files");
        myOut = new StreamWriter(args[0]);
        Console.SetOut(myOut);
        myError = new StreamWriter(args[1]);
        Console.SetError(myError);
     } else {
        Console.Out.WriteLine("Keeping standard output and error unchanged");
     }

     // Output from this section of the program may be redirected
     Console.Out.WriteLine("Text to std output - by Console.Out.WriteLine");
     Console.WriteLine("Text to standard output -  by Console.WriteLine(...)");
     Console.Error.WriteLine("Error msg - by Console.Error.WriteLine(...)");

     if (args.Length == 2) {
       myOut.Close(); myError.Close();
     }     

     Console.SetOut(standardOutput);
     Console.Out.WriteLine("Now we are back again");
     Console.Out.WriteLine("Good Bye");
  }
}

Program 37.23 A program that redirects standard output and standard error to a file.

In the program shown above it is demonstrated how to control standard output and standard error. If we pass two program arguments (args in line 6) to Program 37.23 we redirect standard output and standard error to specific files (instances of StreamWriter) in line 13-17. That is the main point, which we wish to illustrate in Program 37.23.

Below we supply an overview of the methods and properties of the Console class. The Console class is static. As such, all methods and properties in class Console are static. There will never be objects of type Console around. The Console class offers the following operations:

Access to and control of In, Out, and Error
Write, WriteLine, Read, and ReadLine methods
Shortcuts to Out.Write, Out.WriteLine, In.Read, and In.ReadLine
Many properties and methods that control the underlying buffer and window
Size, colors, and positions
Immediate, non-blocking input from the Console
The property KeyAvailable returns if a key is pressed (non-blocking)
ReadKey() returns info about the pressed key (blocking)
Other operations
Clear(), Beep(), and Beep(int, int) methods.

Generated: Monday February 7, 2011, 12:20:03