
CS102: Stream & Files
Copyright © 1999,
Kenneth J. Goldman
Persistent Storage
- So far, we've treated computer software as something that is executed, "lives"
for some period of time, and then terminates. The state of the program has
been transient, meaning that the information (data) of the application
has existed only for the duration of execution and then goes away.
- For applications like a calculator or a computer game, transient state may
suffice, but most computer applications would be essentially useless if you
couldn't exit the program and then later resume where you left off.
- For example,
- text editors need to save the text and load it back
- spreadsheet applications need to save and load data
- database applications must preserve updates for future queries etc.
- In other words, we would like to make (at least some of) the state of an
application persist even after the program exits, and we want to be
able to use that state in a future execution of the program.
- Therefore, it is essential that any computer system provide some means of
persistent storage -- data that survives program termination,
crashes and power outages.
- Persistent data is generally kept on magnetic or similar devices. A disk
has:
- a surface organized into sectors and tracks
- a read/write head that gets and sets the magnetic orientation
of the particles on the disc surface

- The chunk of data in a given track at a given sector is called a
block. Data is read from and written to the disk in whole blocks
at a time.
- How it works (a simplified view):
- When disk I/O (input or output) is requested, the requesting thread
is suspended and the operating system calls some disk driver
software to activate the motors in the disk drive to position the head
at the appropriate track. This is called a head seek.
- The driver waits for the appropriate sector to spin under the head
and then reads or writes the data. A driver may read after writing to
make sure there aren't errors.
- In the case of a read, the resulting data is stored into memory.
- The operating system is notified so that the requesting thread can
be resumed.
- This is all fine, except that it would be very awkward if our application
saved and loaded data by specifying particular blocks (tracks and sectors)
into which data would be saved. Why?
- Hard to move data between machines -- have to name the blocks -- those
might already be in use on the other machine.
- Program errors could erase data from other programs, or erase operating
system files.
- Applications would be hard to write
- Users would have to remember where the data was stored so they could
tell the program where to get their saved data -- or the application
would have to be hard-coded to use certain disk locations (resulting
in possible conflicts)
- Therefore, it is necessary to have some kind of abstraction of persistent
storage that insulates the user and application from these low level details.
Typically, this abstraction is provided by the operating system as a file
system consisted of named files and directories (folders).
How the OS Stores Files and Directories
- Typically, operating systems organize the disk by partitioning it into
regions for different purposes. One partition is generally reserved for
the directory structure, which is a pointer-based data structure stored on the
the disk. Here, the pointers are disk block addresses (instead of memory
addresses), but the principle is the same. The structure in most operating
systems is modeled after the inode structure used by UNIX. Think of
a directory structure as a tree.

The File System Abstraction
- When users and applications programmers think about files, we don't think
about blocks, and we don't think about inodes. Instead, all of that is hidden
from us by an abstraction barrier that includes:
- a user interface (like a file manager with named folders and files that
can be selected, created, destroyed, renamed and manipulated)
- an API for using files and directories from within the applications
programs you write.
- User Interface:
- Under the covers, the operating system manages the inodes and
the disk to provide the illusions of a nice hierarchical file
system to users and programmers
- Programmer API:
- To make code portable across different operating systems, JAVA
provides the package java.io, which is a general API for a file
system. Underneath, java.io is implemented on each operating system
using the file system API provided by that OS. We'll learn about
the java.io package. The basic functionality provided by operating
systems is similar, but generally has fewer features.
- The goal when working with files is usually to
- open a file (possibly creating it)
- read and/or write data from/to the file
- close the file
- Reading and writing files can be accomplished by either:
- sequential access -- (stream abstraction) read (or write) the file
from beginning to end, or
- random access -- seeking to a particular place in the file and reading
from that location
- We'll start with an example of sequential access for a file, using the class
DataInputStream and DataOutputStream.
- voidFileExample(String filename, int someData, String myString) throws IOException {
- File f = new File(filename);
// creates a file object but doesn't actually create a file on disk
OutputStream out = new FileOutputStream(f);
DataOutputStream dataOut = new DataOutputStream(out);
dataOut.writeInt(someData);
dataOut.writeChars(myString);
out.close();
}
void readFileExample(String filename) throws IOException {
- File f = new File(filename);
InputStream in = new FileInputStream(f);
DataInputStream dataIn = new DataInputStream(in);
int someData = dataIn.readInt();
String myString = dataIn.readChars();
in.close();
System.out.println("Read data: " + someData + " " + my String);
}
Shortcuts:
- DataOutputStream dataOut = new DataOutputStream(new FileOutputStream(filename));
DataInputStream dataIn = new DataInputStream(new FileInputStream(filename));
Serializable Objects
DataInputStream and DataOutputStream are fine when you only want to save
primitive data to persistent storage, but what if you want to save an object for
a class you have defined?
In that case, you can use ObjectInputStream and ObjectOutputStream, but the
objects you plan to save must implement the Serializable interface. For
example,
public class Foo implements Serializable {
- int x,y;
String myString;
Vector v;
public Foo(int x, int y, String myString) {
- this.x = x;
this.y = y;
this.myString = myString;
v = new Vector();
}
public void insert(Object obj) {
- v.addElement(obj);
}
}
Oddly enough, the Serializable interface contains no methods -- it simply
indicates that the programmer wishes objects of this type to be able to
be written into and read from streams. Data that is not to be saved can
be marked transient.
Saving and loading a serializable objects using files.
- .
.
Foo f = new Foo(3,4,"testing");
f.insert(myObject);
String filename = "testfile.data";
ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(filename));
oos.writeObject(f);
oos.close();
.
.
All of the data in the instance variable of f will be saved to the file. This
happens recursively, so all objects referenced by f's instance
variables are also saved. All objects must be serializable. For example, if
myObject is not serializable, an exception would be thrown in writeObject.
(The algorithm does detect cycles.)
At a later time, we could retrieve the data as follows:
- .
.
ObjectInputStream ois = new ObjectInputStream(new FileInputStream("testfile.data"));
Foo g = (Foo) ois.readObject();
.
.
Now the variable g will refer to a fully initialized instance of Foo with all
the data. This is a convenient way to make objects persistent. Note that if
you modify the class Foo and try to read the data saved from the old version,
an exception will occur unless their static final long serialVersion VID values
are the same.
