Jennifer L. Welch

Jennifer L. Welch
Research Summary and List of Papers

[Clock Synchronization] [Timing Issues] [Communication Protocols] [Shared Objects] [Modularity] [Mobile Ad Hoc Networks] [Metamorphic Robots] [Randomized Distributed Data Structures]

Clock Synchronization: Synchronized clocks are very useful in a distributed system. However, getting and keeping clocks synchronized in a distributed system is challenging due to uncertainties in message delays, failures, and clock drift. The first two papers give lower bounds on how closely clocks can be synchronized even in the well-behaved case of no failures and no clock drift.

Biaz, S. and Welch, J. L., "Closed Form Bounds for Clock Synchronization Under Simple Uncertainty Assumptions," Information Processing Letters, to appear. abstract, ps, pdf.
Lundelius, J. and Lynch, N., "An Upper and Lower Bound for Clock Synchronization," Information and Control, Vol. 62, Nos. 2/3, pp. 190-204, 1984. abstract.

The next group of papers give algorithms for synchronizing clocks under various failure models.

Dolev, S. and Welch, J. L., "Self-Stabilizing Clock Synchronization in the Presence of Byzantine Faults," 2nd Workshop on Self-Stabilizing Systems, May 1995. abstract, ps, pdf.
Dolev, S. and Welch, J. L., "Wait-free Clock Synchronization," Algorithmica, Vol. 18, pp. 486-511, 1997. abstract, ps, pdf.
An earlier version appeared as:
Dolev, S. and Welch, J. L., "Wait-Free Clock Synchronization," Proc. 12th ACM Symposium on Principles of Distributed Computing (PODC), pp. 97-108, August 1993.
Welch, J. L. and Lynch, N., "A New Fault-Tolerant Algorithm for Clock Synchronization," Information and Computation, Vol. 77, No. 1, pp. 1-36, 1988. abstract.
An earlier version appeared as:
Lundelius, J. and Lynch, N., "A New Fault-Tolerant Algorithm for Clock Synchronization," Proc. 3rd ACM Symposium on Principles of Distributed Computing (PODC), pp. 75-88, August 1984.

An old survey:

Simons, B., Welch, J. L. and Lynch, N., "An Overview of Clock Synchronization," (pp. 84-96) in Simons, B. and Spector, A. (Eds.), Fault-Tolerant Distributed Computing, Lecture Notes in Computer Science 448, Springer-Verlag, New York, 1990. abstract, ps, pdf.

Back to beginning

Timing Issues: The level of synchrony available in a system can impact what problems can be solved and how efficiently they can be solved. The first two papers show how the time required to solve a synchronization problem called the session problem is affected by the kind of timing behavior exhibited by the system.

Rhee, I. and Welch, J. L., "Time Bounds on Synchronization in a Periodic Distributed System," Information Processing Letters, Vol. 64, No. 2, pp. 87-93, October 1997. abstract, ps, pdf.
Rhee, I. and Welch, J. L., "The Impact of Time on the Session Problem," Proc. 11th ACM Symposium on Principles of Distributed Computing (PODC), pp. 191-202, August 1992. abstract, ps, pdf.

This paper analyzes the response time for the dining philosophers resource allocation problem.

Rhee, I. and Welch, J. L., "Time Bounds on the Response Time for the Dining Philosophers Problem," Proc. International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), November 1995.

The next paper provides an algorithm and two related lower bounds for a version of the transaction commit problem in a partially synchronous model.

Coan, B. A. and Welch, J. L., "Transaction Commit in a Realistic Timing Model," Distributed Computing, Vol. 4, No. 2, pp. 87-103, 1990. abstract.
An earlier version appeared as:
Coan, B. A. and Lundelius, J., "Transaction Commit in a Realistic Fault Model, Proc. 5th ACM Symposium on Principles of Distributed Computing (PODC), pp. 40-51, August 1986.

This paper shows that a system with asynchronous processors and messages can simulate, in the presence of various kinds of processor failures, a system with synchronous processors but asynchronous failures. This result implies that the impossibility of fault-tolerant consensus in the synchronous processors model follows immediately from the impossibility of fault-tolerant consensus in the totally asynchronous model.

Welch, J. L., "Simulating Synchronous Processors," Information and Computation, Vol. 74, No. 2, pp. 159-171, 1987. abstract.

Back to beginning

Communication Protocols: These papers are all concerned with some aspect of interprocessor communication using messages. The first paper analyzed how to choose how long to wait before deciding to retransmit a (possibly lost) message.

Dolev, S., Kate, M., and Welch, J. L., "A Competitive Analysis for Retransmission Timeout," Networks, Vol. 34, No. 1, pp. 73-80, 1999. abstract, ps, pdf.
An earlier version appeared as:
Dolev, S., Kate, M., and Welch, J. L., "A Competitive Analysis for Retransmission Timeout," Proc. 15th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 450-455, May/June 1995.

The next papers give algorithms and impossibility results for setting up and tearing down connections and delivering messages within connections, under various system assumptions.

Chaudhuri, S., Coan, B. A. and Welch, J. L., "Using Adaptive Timeouts to Achieve At-Most-Once Message Delivery," Distributed Computing, Vol. 9, pp. 109-117, 1995. abstract, ps, pdf.
An earlier version appeared as:
Chaudhuri, S., Coan, B. A. and Welch, J. L., "Using Adaptive Timeouts to Achieve At-Most-Once Message Delivery," Proc. 5th International Workshop on Distributed Algorithms (WDAG) (Springer-Verlag LNCS 579), pp. 151-166, October 1991.
Walter, J. and Welch, J. L., "Hazard-Free Connection Release," Proc. International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA), pp. 1668-1672, June/July 1997. abstract, ps, pdf.
Attiya, H., Dolev, S., and Welch, J. L., "Connection Management Without Retaining Information," Information and Computation, Vol. 123, No. 2, pp. 155-171, December 1995. abstract, ps, pdf.
An earlier version appeared as:
Attiya, H., Dolev, S., and Welch, J. L., "Connection Management Without Retaining Information," Proc. 28th Hawaii International Conference on System Sciences, pp. 622-631, January 1995.
Dolev, S. and Welch, J. L., "Crash-Resilient Communication in Dynamic Networks," IEEE Transactions on Computers, Vol. 46, No. 1, pp. 14-26, January 1997. abstract, ps, pdf.
An earlier version appeared as:
Dolev, S. and Welch, J. L., "Crash Resilient Communication in Dynamic Networks," Proc. 7th International Workshop on Distributed Algorithms (WDAG) (Springer-Verlag LNCS 725), pp. 129-144, September 1993.

Other topics:

Abu-Amara, H., Coan, B. A., Dolev, S., Kanevsky, A., and Welch, J. L., "Self-Stabilizing Topology Maintenance Protocols for High-Speed Networks," IEEE/ACM Transactions on Networking, Vol. 4, No. 6, pp. 902-912, December 1996. abstract, ps, pdf.
An earlier version appeared as:
Abu-Amara, H., Coan, B. A., Dolev, S., Kanevsky, A., and Welch, J. L., "A Fault-Tolerant Layered Approach to Fiber Optic Networks," Proc. Conference on High-Speed Networking and Multimedia Computing, IS&T/SPIE Symposium on Electronic Imaging Science & Technology, pp. 380-391, February 1994.
Sistla, A. P. and Welch, J. L., "Efficient Distributed Recovery Using Message Logging," Proc. 8th ACM Symposium on Principles of Distributed Computing (PODC), pp. 223-238, August 1989. abstract,

Back to beginning

Shared Objects: The first two papers explored the complexity of implementing, in a wait-free manner, multi-valued shared registers (the "logical" registers) out of binary-valued shared registers (the "physical" registers). Wait-free means that any operation of a user of the implemented register will terminate successfully even if all the other users crash. The complexity measures studied were the number of physical reads per logical read, the number of physical writes per logical write, and the number of physical registers required.

Chaudhuri, S., Kosa, M. J. and Welch, J. L., "One-Write Algorithms for Multivalued Regular and Atomic Registers," Acta Informatica, Vol. 37, pp. 161-192, 2000. abstract, ps, pdf.
An earlier version appeared as:
Chaudhuri, S., Kosa, M. J. and Welch, J. L., "Upper and Lower Bounds for One-Write Multivalued Regular Registers," Proc. 3rd IEEE Symposium on Parallel and Distributed Processing (SPDP), pp. 134-141, December 1991.
Chaudhuri, S. and Welch, J. L., "Bounds on the Costs of Multivalued Register Implementations, SIAM Journal on Computing, Vol. 23, No. 2, pp. 335-354, April 1994. abstract, ps, pdf.
An earlier version appeared as:
Chaudhuri, S. and Welch, J. L., "Bounds on the Costs of Register Implementations," Proc. 4th International Workshop on Distributed Algorithms (WDAG) (Springer-Verlag LNCS 486), pp. 402-421, September 1990.

The next two papers concern implementing a shared object on top of message-passing, with emphasis on the consistency conditions provided (not on fault-tolerance). The first paper presents a framework for defining consistency conditions that allows optimized, non-sequential memory accesses and describes, and proves correct, several programming paradigms for use with such conditions. The second paper proves that implementing linearizability is strictly more expensive than implementing sequential consistency, in terms of elapsed time per operation, for three common shared data types.

Attiya, H., Chaudhuri, S., Friedman, R. and Welch, J. L., "Shared Memory Consistency Conditions for Non-Sequential Execution: Definitions and Programming Strategies," SIAM Journal on Computing, Vol. 27, No. 1, pp. 65-89, February 1998. abstract, ps, pdf.
An earlier version appeared as:
Attiya, H., Chaudhuri, S., Friedman, R. and Welch, J. L., "Shared Memory Consistency Conditions for Non-Sequential Execution: Definitions and Programming Strategies," Proc. 5th ACM Symposium on Parallel Algorithms and Architectures, pp. 241-250, July 1993.
Attiya, H. and Welch, J. L., "Sequential Consistency versus Linearizability," ACM Transactions on Computer Systems, Vol. 12, No. 1, pp. 91-122, May 1994. abstract, ps, pdf.
An earlier version appeared as:
Attiya, H. and Welch, J. L., "Sequential Consistency vs. Linearizability," Proc. 3rd ACM Symposium on Parallel Algorithms and Architectures, pp. 304-315, July 1991.

The last two papers studied how to implement a shared object on top of message passing in the presence of crash failures.

Kanthadai, S. and Welch, J. L., "Implementation of Recoverable Distributed Shared Memory by Logging Writes," Proc. 16th IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 116-124, May 1996. abstract.
Chaudhuri, S., Kanthadai, S., and Welch, J. L., "The Role of Data-Race-Free Programs in Recoverable DSM," brief announcement, Proc. 15th ACM Symposium on Principles of Distributed Computing (PODC), p. 245, May 1996.

Back to beginning

Modularity: These two papers constructed efficient algorithms for Byzantine agreement (getting processors to agree on a decision value even when some of them can exhibit arbitrary, or Byzantine, failures); using subroutines in a clever way allowed us to achieve good complexity bounds.

Coan, B. A. and Welch, J. L., "Modular Construction of an Efficient 1-Bit Byzantine Agreement Protocol," Mathematical Systems Theory, Vol. 26, pp. 131-154, 1993. abstract.
An earlier version appeared as:
Coan, B. A. and Welch, J. L., "Modular Construction of Efficient Byzantine Agreement Protocols," Proc. 8th ACM Symposium on Principles of Distributed Computing (PODC), pp. 295-306, August 1989.
Coan, B. A. and Welch, J. L., "Modular Construction of a Byzantine Agreement Protocol with Optimal Message Bit Complexity," Information and Computation, Vol. 97, No. 1, pp. 61-85, 1992. abstract.
An earlier version appeared as:
Coan, B. A. and Welch, J. L., "A Byzantine Agreement Protocol with Optimal Message Bit Complexity," Proc. 27th Allerton Conference on Communication, Control and Computing, pp. 1062-1071, September 1989.

The next two papers described existing distributed algorithms (one for finding a minimum spanning tree and another for solving a resource allocation problem) in modular ways and used that modularity to understand more about the algorithms and give rigorous correctness proofs.

Welch, J. L., Lamport, L. and Lynch, N., "A Lattice-Structured Proof Technique Applied to a Minimum Spanning Tree Algorithm," Proc. 7th ACM Symposium on Principles of Distributed Computing (PODC), pp. 28-43, August 1988. abstract.
Welch, J. L. and Lynch, N. A., "A Modular Drinking Philosophers Algorithm," Distributed Computing, Vol. 6, pp. 233-244, 1993. abstract, ps, pdf.

Back to beginning

Mobile Ad Hoc Networks: Mobile ad hoc networks are formed by a collection of, potentially mobile, wireless nodes; communication links form and disappear as nodes come into and go out of each other's communication range. Such networks have many practical applications, including home networking, personal area networking, search-and-rescue, and military operations. There has been a significant amount of activity in the area of routing and medium access control for mobile ad hoc networks. In contrast, we are investigating the design of distributed algorithms, and services based on these distributed algorithms. The eventual objective is to develop a middleware that provides services useful to upper layers.

Malpani, N., Vaidya, N., and Welch, J. L., "Distributed Token Circulation in Mobile Ad Hoc Networks," to appear, Proc. 9th International Conference on Network Protocols (ICNP), November 2001. abstract, ps, pdf.
Walter, J., Welch, J. L. and Vaidya, N., "A Mutual Exclusion Algorithm for Ad Hoc Mobile Networks," accepted to Wireless Networks. abstract, ps, pdf.
An earlier version appeared as:
Walter, J., Welch, J. L. and Vaidya, N., "A Mutual Exclusion Algorithm for Ad Hoc Mobile Networks," presented at 2nd International Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications (DIAL M for Mobility), October 1998.
Malpani, N., Welch, J. L., and Vaidya, N., "Leader Election Algorithms for Mobile Ad Hoc Networks," Proc. 4th International Workshop on Discrete Algorithms and Methods for Mobile Computing and Communications (DIAL M for Mobility), pp. 96-103, August 2000. This is actually a revised version with a simplified algorithm for the case of multiple topology changes. abstract, ps, pdf.

The next paper is grouped here since it concerns mobile computing but it is for the cellular model, not the ad hoc network model.

Dolev, S., Pradhan, D. K., and Welch, J. L., "Modified Tree Structure for Location Management in Mobile Environments," Computer Communications, Vol. 19, pp. 335-345, 1996. abstract, ps, pdf.
An earlier version appeared as:
Dolev, S., Pradhan, D. K., and Welch, J. L., "Modified Tree Structure for Location Management in Mobile Environments," Proc. 14th Annual Joint Conference of IEEE Computer and Communication Societies (INFOCOM), pp. 530-537, April 1995.

Back to beginning

Metamorphic Robots: The robotics community has been exploring how to get a collection of homogeneous interlocking robots to change their global shape. Much previous work focused on centralized algorithms for the problem. We found some simple distributed algorithms for some simple cases. We hope to use them as building blocks to achieve more complex reconfigurations.

Walter, J., Welch, J. L., and Amato, N. A., "Distributed Reconfiguration of Hexagonal Robots in Two Dimensions," in Sensor Fusion and Decentralized Control in Robotic Systems III, Gerard T. McKee, Paul S. Schenker, Editors, Proceedings of SPIE Vol. 4196, pp. 441-453, November 2000. abstract, ps, pdf. Full version submitted for journal publication: ps, pdf.
Walter, J., Welch, J. L., and Amato, N. A., "Distributed Reconfiguration of Metamorphic Robot Chains," Proc. 19th ACM Symposium on Principles of Distributed Computing (PODC), pp. 171-180, July 2000. abstract, ps, pdf.

Back to beginning

Randomized Distributed Data Structures: Randomized algorithms can be simpler and more efficient than deterministic algorithms for the same problem and can even allow the existence of solutions to some problems for which there are no deterministic algorithms. The drawback is that absolute correctness cannot be guaranteed in all situations; typically a randomized algorithm has some, preferably minuscule, probability of either not terminating or returning an incorrect answer. We are interested in developing useful and rigorous specifications of distributed data structures that have probabilistic behavior; finding efficient distributed algorithms to implement the shared objects so specified; and identifying classes of distributed applications that can tolerate probabilistic data structures.

Lee, H. and Welch, J. L., "Randomized Shared Queues Applied to Distributed Optimization Algorithms," to appear, Proc. 12th International Symposium on Algorithms and Computation (ISAAC), December 2001. abstract, ps, pdf.
An earlier version appears as:
Lee, H. and Welch, J. L., "Brief Announcement: Randomized Shared Queues", Proc. 20th ACM Symposium on Principles of Distributed Computing (PODC), August 2001.
Lee, H. and Welch, J. L., "Applications of Probabilistic Quorums to Iterative Algorithms," Proc. 21st IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 21-30, April 2001. Nominated for best paper award. abstract, ps, pdf.
An earlier version appeared as:
Lee, H. and Welch, J. L., "Brief Announcement: Specification, Implementation and Application of Randomized Regular Registers," Proc. 19th ACM Symposium on Principles of Distributed Computing (PODC), p. 338, July 2000.

Back to beginning