Doctor of Philosophy (Ph.D.)
Concurrent systems are used in applications where multiple processors are needed to complete tasks within a reasonable amount of time, or where the data sets involved will not fit within the main memory of a single computer. Because of their reliance on multiple machines, such systems are proportionally more vulnerable to both hardware and software induced failures. Fault-tolerance schemes are used to recover some earlier consistent state of the system after such a failure.;One important technique used to achieve fault-tolerance is checkpointing and rollback-recovery. In this thesis, we present a method for efficiently and transparently incorporating the part of the process state contained in the file system into process checkpoints, and we show how recovery of consistent versions of the file system and processes may be done after a failure. We present the details of a prototype system which implements our method.;We show that by using the special properties of the log-structured file system, the class of programs which are amenable to checkpointing and rollback-recovery schemes can be expanded to include those that use files. We impose no a priori restriction on the types of file system operations that can be done, and we demonstrate that our scheme does not impose significant failure-free overhead on the computation.
© The Author
Matthews, Robert Edwin, "Files as first-class objects in fault -tolerant concurrent systems" (2004). Dissertations, Theses, and Masters Projects. Paper 1539623456.