Chapter 2: Technological Background
Certain changes in technology over the past two decades greatly in?uenced the nature of computational resources, and the manner in which they were used. These developments created the conditions under which the notion of a distributed ?le systems (DFS) was born. This chapter describes these technological changes, and explores how a distributed ?le system attempts to capitalize on the new computing environment?s strengths and minimize its disadvantages.

Section 2.1: Shift in Computational Idioms

By the beginning of the 1980s, new classes of computing engines and new methods by which they may be interconnected were becoming firmly established. At this time, a shift was occurring away from the conventional mainframe-based, timeshared computing environment to one in which both workstation-class machines and the smaller personal computers (PCs) were a strong presence.
The new environment offered many benefits to its users when compared with timesharing. These smaller, self-sufficient machines moved dedicated computing power and cycles directly onto people's desks. Personal machines were powerful enough to support a wide variety of applications, and allowed for a richer, more intuitive, more graphically-based interface for them. Learning curves were greatly reduced, cutting training costs and increasing new-employee productivity. In addition, these machines provided a constant level of service throughout the day. Since a personal machine was typically only executing programs for a single human user, it did not suffer from timesharing's load-based response time degradation. Expanding the computing services for an organization was often accomplished by simply purchasing more of the relatively cheap machines. Even small organizations could now afford their own computing resources, over which they exercised full control. This provided more freedom to tailor computing services to the specific needs of particular groups.
However, many of the benefits offered by the timesharing systems were lost when the computing idiom first shifted to include personal-style machines. One of the prime casualties of this shift was the loss of the notion of a single name space for all files. Instead, workstation-and PC-based environments each had independent and completely disconnected file systems. The standardized mechanisms through which files could be transferred between machines (e.g., FTP) were largely designed at a time when there were relatively few large machines that were connected over slow links. Although the newer multi-megabit per second communication pathways allowed for faster transfers, the problem of resource location in this environment was still not addressed. There was no longer a system-wide file system, or even a file location service, so individual users were more isolated from the organization's collective data. Overall, disk requirements ballooned, since lack of a shared file system was often resolved by replicating all programs and data to each machine that needed it. This proliferation of independent copies further complicated the problem of version control and management in this distributed world. Since computers were often no longer behind locked doors at a computer center, user authentication and authorization tasks became more complex. Also, since organizational managers were now in direct control of their computing facilities, they had to also actively manage the hardware and software upon which they depended.
Overall, many of the benefits of the proliferation of independent, personal-style machines were partially offset by the communication and organizational penalties they imposed. Collaborative work and dissemination of information became more difficult now that the previously unified file system was fragmented among hundreds of autonomous machines.

Section 2.2: Distributed File Systems

As a response to the situation outlined above, the notion of a distributed file system (DFS) was developed. Basically, a DFS provides a framework in which access to files is permitted regardless of their locations. Specifically, a distributed file system offers a single, common set of file system operations through which those accesses are performed.
There are two major variations on the core DFS concept, classified according to the way in which file storage is managed. These high-level models are defined below.
  • Peer-to-peer: In this symmetrical model, each participating machine provides storage for specific set of files on its own attached disk(s), and allows others to access them remotely. Thus, each node in the DFS is capable of both importing files (making reference to files resident on foreign machines) and exporting files (allowing other machines to reference files located locally).
  • Server-client: In this model, a set of machines designated as servers provide the storage for all of the files in the DFS. All other machines, known as clients, must direct their file references to these machines. Thus, servers are the sole exporters of files in the DFS, and clients are the sole importers.
The notion of a DFS, whether organized using the peer-to-peer or server-client discipline, may be used as a conceptual base upon which the advantages of personal computing resources can be combined with the single-system benefits of classical timeshared operation.
Many distributed file systems have been designed and deployed, operating on the fast local area networks available to connect machines within a single site. These systems include DOMAIN [9], DS [15], RFS [16], and Sprite [10]. Perhaps the most widespread of distributed file systems to date is a product from Sun Microsystems, NFS [13] [14], extending the popular unix file system so that it operates over local networks.

Section 2.3: Wide-Area Distributed File Systems

Improvements in long-haul network technology are allowing for faster interconnection bandwidths and smaller latencies between distant sites. Backbone services have been set up across the country, and T1 (1.5 megabit/second) links are increasingly available to a larger number of locations. Long-distance channels are still at best approximately an order of magnitude slower than the typical local area network, and often two orders of magnitude slower. The narrowed difference between local-area and wide-area data paths opens the window for the notion of a wide-area distributed file system (WADFS). In a WADFS, the transparency of file access offered by a local-area DFS is extended to cover machines across much larger distances. Wide-area file system functionality facilitates collaborative work and dissemination of information in this larger theater of operation.