Jennifer L. Welch
Research Summary and List of Papers
[Clock Synchronization]
[Timing Issues]
[Communication Protocols]
[Shared Objects]
[Modularity]
[Mobile Ad Hoc Networks]
[Metamorphic Robots]
[Randomized Distributed Data Structures]
Clock Synchronization:
Synchronized clocks are very useful in a distributed system.
However, getting and keeping clocks synchronized in a distributed system is
challenging due to uncertainties in message delays, failures, and clock drift.
The first two papers give lower bounds on how closely clocks
can be synchronized even in the well-behaved case of no failures and
no clock drift.
- Biaz, S. and Welch, J. L.,
"Closed Form Bounds for Clock Synchronization Under Simple Uncertainty
Assumptions,"
Information Processing Letters, to appear.
abstract,
ps,
pdf.
- Lundelius, J. and Lynch, N.,
"An Upper and Lower Bound for Clock Synchronization,"
Information and Control, Vol. 62, Nos. 2/3, pp. 190-204, 1984.
abstract.
The next group of papers give algorithms for synchronizing clocks
under various failure models.
- Dolev, S. and Welch, J. L.,
"Self-Stabilizing Clock Synchronization in the Presence of Byzantine
Faults,"
2nd Workshop on Self-Stabilizing Systems, May 1995.
abstract,
ps,
pdf.
- Dolev, S. and Welch, J. L.,
"Wait-free Clock Synchronization,"
Algorithmica, Vol. 18, pp. 486-511, 1997.
abstract,
ps,
pdf.
An earlier version appeared as:
Dolev, S. and Welch, J. L.,
"Wait-Free Clock Synchronization,"
Proc. 12th ACM Symposium on Principles of Distributed Computing (PODC),
pp. 97-108, August 1993.
- Welch, J. L. and Lynch, N.,
"A New Fault-Tolerant Algorithm for Clock Synchronization,"
Information and Computation,
Vol. 77, No. 1, pp. 1-36, 1988.
abstract.
An earlier version appeared as:
Lundelius, J. and Lynch, N.,
"A New Fault-Tolerant Algorithm for Clock Synchronization,"
Proc. 3rd ACM Symposium on Principles of Distributed Computing (PODC),
pp. 75-88, August 1984.
An old survey:
-
Simons, B., Welch, J. L. and Lynch, N.,
"An Overview of Clock Synchronization,"
(pp. 84-96)
in
Simons, B. and Spector, A. (Eds.),
Fault-Tolerant Distributed Computing,
Lecture Notes in Computer Science 448, Springer-Verlag, New York, 1990.
abstract,
ps,
pdf.
Back to beginning
Timing Issues:
The level of synchrony available in a system can impact what
problems can be solved and how efficiently they can be solved.
The first two papers show how the time required to solve a synchronization
problem called the session problem is affected by the kind of timing
behavior exhibited by the system.
- Rhee, I. and Welch, J. L.,
"Time Bounds on Synchronization in a Periodic Distributed System,"
Information Processing Letters, Vol. 64, No. 2, pp. 87-93,
October 1997.
abstract,
ps,
pdf.
- Rhee, I. and Welch, J. L.,
"The Impact of Time on the Session Problem,"
Proc. 11th ACM Symposium on Principles of Distributed Computing
(PODC),
pp. 191-202, August 1992.
abstract,
ps,
pdf.
This paper analyzes the response time for the dining philosophers
resource allocation problem.
- Rhee, I. and Welch, J. L.,
"Time Bounds on the Response Time for the Dining Philosophers Problem,"
Proc. International Conference on Parallel and Distributed
Processing Techniques and Applications (PDPTA),
November 1995.
The next paper provides an algorithm and two related lower bounds
for a version of the transaction commit problem in a partially
synchronous model.
- Coan, B. A. and Welch, J. L.,
"Transaction Commit in a Realistic Timing Model,"
Distributed Computing, Vol. 4, No. 2, pp. 87-103, 1990.
abstract.
An earlier version appeared as:
Coan, B. A. and Lundelius, J.,
"Transaction Commit in a Realistic Fault Model,
Proc. 5th ACM Symposium on Principles of Distributed Computing (PODC),
pp. 40-51, August 1986.
This paper shows that a system with asynchronous processors and
messages can simulate, in the presence of various kinds of processor
failures, a system with synchronous processors but asynchronous
failures. This result implies that the impossibility of fault-tolerant
consensus in the synchronous processors model follows immediately from
the impossibility of fault-tolerant consensus in the totally asynchronous
model.
- Welch, J. L.,
"Simulating Synchronous Processors,"
Information and Computation, Vol. 74, No. 2, pp. 159-171, 1987.
abstract.
Back to beginning
Communication Protocols:
These papers are all concerned with some aspect of interprocessor
communication using messages.
The first paper analyzed how to choose how long to wait before deciding
to retransmit a (possibly lost) message.
- Dolev, S., Kate, M., and Welch, J. L.,
"A Competitive Analysis for Retransmission Timeout,"
Networks, Vol. 34, No. 1, pp. 73-80, 1999.
abstract,
ps,
pdf.
An earlier version appeared as:
Dolev, S., Kate, M., and Welch, J. L.,
"A Competitive Analysis for Retransmission Timeout,"
Proc. 15th IEEE International Conference on Distributed Computing
Systems (ICDCS),
pp. 450-455, May/June 1995.
The next papers give algorithms and impossibility results for setting
up and tearing down connections and delivering messages within
connections, under various system assumptions.
- Chaudhuri, S., Coan, B. A. and Welch, J. L.,
"Using Adaptive Timeouts to Achieve At-Most-Once Message Delivery,"
Distributed Computing, Vol. 9, pp. 109-117, 1995.
abstract,
ps,
pdf.
An earlier version appeared as:
Chaudhuri, S., Coan, B. A. and Welch, J. L.,
"Using Adaptive Timeouts to Achieve At-Most-Once Message Delivery,"
Proc. 5th International Workshop on Distributed Algorithms (WDAG)
(Springer-Verlag LNCS 579), pp. 151-166, October 1991.
- Walter, J. and Welch, J. L.,
"Hazard-Free Connection Release,"
Proc. International Conference on Parallel and Distributed
Processing Techniques and Applications (PDPTA),
pp. 1668-1672, June/July 1997.
abstract,
ps,
pdf.
- Attiya, H., Dolev, S., and Welch, J. L.,
"Connection Management Without Retaining Information,"
Information and Computation, Vol. 123, No. 2, pp. 155-171,
December 1995.
abstract,
ps,
pdf.
An earlier version appeared as:
Attiya, H., Dolev, S., and Welch, J. L.,
"Connection Management Without Retaining Information,"
Proc. 28th Hawaii International Conference on System Sciences,
pp. 622-631, January 1995.
- Dolev, S. and Welch, J. L.,
"Crash-Resilient Communication in Dynamic Networks,"
IEEE Transactions on Computers, Vol. 46, No. 1, pp. 14-26,
January 1997.
abstract,
ps,
pdf.
An earlier version appeared as:
Dolev, S. and Welch, J. L.,
"Crash Resilient Communication in Dynamic Networks,"
Proc. 7th International Workshop on Distributed Algorithms (WDAG)
(Springer-Verlag LNCS 725),
pp. 129-144, September 1993.
Other topics:
- Abu-Amara, H., Coan, B. A., Dolev, S., Kanevsky, A., and Welch, J. L.,
"Self-Stabilizing Topology Maintenance Protocols for High-Speed Networks,"
IEEE/ACM Transactions on Networking, Vol. 4, No. 6, pp. 902-912,
December 1996.
abstract,
ps,
pdf.
An earlier version appeared as:
Abu-Amara, H., Coan, B. A., Dolev, S., Kanevsky, A., and Welch, J. L.,
"A Fault-Tolerant Layered Approach to Fiber Optic Networks,"
Proc. Conference on High-Speed Networking and Multimedia Computing,
IS&T/SPIE Symposium on Electronic Imaging Science & Technology,
pp. 380-391, February 1994.
- Sistla, A. P. and Welch, J. L.,
"Efficient Distributed Recovery Using Message Logging,"
Proc. 8th ACM Symposium on Principles of Distributed Computing (PODC),
pp. 223-238, August 1989.
abstract,
Back to beginning
Shared Objects:
The first two papers explored the complexity of implementing, in a
wait-free manner, multi-valued shared registers (the "logical"
registers) out of binary-valued shared registers (the "physical"
registers). Wait-free means that any operation of a user of the
implemented register will terminate successfully even if all the other
users crash. The complexity measures studied were the number of
physical reads per logical read, the number of physical writes per
logical write, and the number of physical registers required.
- Chaudhuri, S., Kosa, M. J. and Welch, J. L.,
"One-Write Algorithms for Multivalued Regular and Atomic Registers,"
Acta Informatica, Vol. 37, pp. 161-192, 2000.
abstract,
ps,
pdf.
An earlier version appeared as:
Chaudhuri, S., Kosa, M. J. and Welch, J. L.,
"Upper and Lower Bounds for One-Write Multivalued Regular Registers,"
Proc. 3rd IEEE Symposium on Parallel and Distributed Processing (SPDP),
pp. 134-141, December 1991.
- Chaudhuri, S. and Welch, J. L.,
"Bounds on the Costs of Multivalued Register Implementations,
SIAM Journal on Computing, Vol. 23, No. 2, pp. 335-354, April 1994.
abstract,
ps,
pdf.
An earlier version appeared as:
Chaudhuri, S. and Welch, J. L.,
"Bounds on the Costs of Register Implementations,"
Proc. 4th International Workshop on Distributed Algorithms (WDAG)
(Springer-Verlag LNCS 486), pp. 402-421,
September 1990.
The next two papers concern implementing a shared object on top of
message-passing, with emphasis on the consistency conditions provided
(not on fault-tolerance). The first paper presents a framework for
defining consistency conditions that allows optimized, non-sequential
memory accesses and describes, and proves correct, several programming
paradigms for use with such conditions. The second paper proves that
implementing linearizability is strictly more expensive than
implementing sequential consistency, in terms of elapsed time per
operation, for three common shared data types.
- Attiya, H., Chaudhuri, S., Friedman, R. and Welch, J. L.,
"Shared Memory Consistency Conditions for Non-Sequential Execution:
Definitions and Programming Strategies,"
SIAM Journal on Computing, Vol. 27, No. 1, pp. 65-89,
February 1998.
abstract,
ps,
pdf.
An earlier version appeared as:
Attiya, H., Chaudhuri, S., Friedman, R. and Welch, J. L.,
"Shared Memory Consistency Conditions for Non-Sequential Execution:
Definitions and Programming Strategies,"
Proc. 5th ACM Symposium on Parallel Algorithms and Architectures,
pp. 241-250, July 1993.
- Attiya, H. and Welch, J. L.,
"Sequential Consistency versus Linearizability,"
ACM Transactions on Computer Systems, Vol. 12, No. 1, pp. 91-122,
May 1994.
abstract,
ps,
pdf.
An earlier version appeared as:
Attiya, H. and Welch, J. L.,
"Sequential Consistency vs. Linearizability,"
Proc. 3rd ACM Symposium on Parallel Algorithms and Architectures,
pp. 304-315, July 1991.
The last two papers studied how to implement a shared object on top
of message passing in the presence of crash failures.
- Kanthadai, S. and Welch, J. L.,
"Implementation of Recoverable Distributed Shared Memory by Logging
Writes,"
Proc. 16th IEEE International Conference on Distributed Computing
Systems (ICDCS),
pp. 116-124, May 1996.
abstract.
- Chaudhuri, S., Kanthadai, S., and Welch, J. L.,
"The Role of Data-Race-Free Programs in Recoverable DSM,"
brief announcement,
Proc. 15th ACM Symposium on Principles of Distributed Computing (PODC),
p. 245, May 1996.
Back to beginning
Modularity:
These two papers constructed efficient algorithms for Byzantine
agreement (getting processors to agree on a decision value even when
some of them can exhibit arbitrary, or Byzantine, failures);
using subroutines in a clever way allowed us to achieve good complexity
bounds.
- Coan, B. A. and Welch, J. L.,
"Modular Construction of an Efficient 1-Bit Byzantine Agreement Protocol,"
Mathematical Systems Theory, Vol. 26, pp. 131-154, 1993.
abstract.
An earlier version appeared as:
Coan, B. A. and Welch, J. L.,
"Modular Construction of Efficient Byzantine Agreement Protocols,"
Proc. 8th ACM Symposium on Principles of Distributed Computing (PODC),
pp. 295-306, August 1989.
- Coan, B. A. and Welch, J. L.,
"Modular Construction of a Byzantine Agreement Protocol with Optimal Message
Bit Complexity,"
Information and Computation, Vol. 97, No. 1, pp. 61-85, 1992.
abstract.
An earlier version appeared as:
Coan, B. A. and Welch, J. L.,
"A Byzantine Agreement Protocol with Optimal Message Bit Complexity,"
Proc. 27th Allerton Conference on Communication, Control and Computing,
pp. 1062-1071, September 1989.
The next two papers described existing distributed algorithms (one for
finding a minimum spanning tree and another for solving a resource
allocation problem) in modular ways and used that modularity to
understand more about the algorithms and give rigorous correctness
proofs.
- Welch, J. L., Lamport, L. and Lynch, N.,
"A Lattice-Structured Proof Technique Applied to a Minimum Spanning
Tree Algorithm,"
Proc. 7th ACM Symposium on Principles of Distributed Computing (PODC),
pp. 28-43, August 1988.
abstract.
- Welch, J. L. and Lynch, N. A.,
"A Modular Drinking Philosophers Algorithm,"
Distributed Computing, Vol. 6, pp. 233-244, 1993.
abstract,
ps,
pdf.
Back to beginning
Mobile Ad Hoc Networks:
Mobile ad hoc networks are formed by a collection of,
potentially mobile, wireless nodes; communication links form and disappear
as nodes come into and go out of each other's communication range.
Such networks have many practical
applications, including home networking, personal area networking,
search-and-rescue, and military operations.
There has been a significant amount of activity in the area of
routing and medium access control for
mobile ad hoc
networks.
In contrast, we are investigating the design of
distributed algorithms, and services based on these distributed
algorithms. The eventual objective is to develop a middleware that
provides services useful to upper layers.
- Malpani, N., Vaidya, N., and Welch, J. L.,
"Distributed Token Circulation in Mobile Ad Hoc Networks," to appear,
Proc. 9th International Conference on Network Protocols (ICNP),
November 2001.
abstract,
ps,
pdf.
- Walter, J., Welch, J. L. and Vaidya, N.,
"A Mutual Exclusion Algorithm for Ad Hoc Mobile Networks,"
accepted to Wireless Networks.
abstract,
ps,
pdf.
An earlier version appeared as:
Walter, J., Welch, J. L. and Vaidya, N.,
"A Mutual Exclusion Algorithm for Ad Hoc Mobile Networks,"
presented at 2nd International Workshop on Discrete Algorithms
and Methods for Mobile Computing and Communications (DIAL M
for Mobility), October 1998.
- Malpani, N., Welch, J. L., and Vaidya, N.,
"Leader Election Algorithms for Mobile Ad Hoc Networks,"
Proc. 4th International Workshop on Discrete Algorithms
and Methods for Mobile Computing and Communications (DIAL M
for Mobility), pp. 96-103, August 2000.
This is actually a revised version with a simplified algorithm
for the case of multiple topology changes.
abstract,
ps,
pdf.
The next paper is grouped here since it concerns mobile computing
but it is for the cellular model, not the ad hoc network model.
- Dolev, S., Pradhan, D. K., and Welch, J. L.,
"Modified Tree Structure for Location Management in Mobile Environments,"
Computer Communications, Vol. 19, pp. 335-345, 1996.
abstract,
ps,
pdf.
An earlier version appeared as:
Dolev, S., Pradhan, D. K., and Welch, J. L.,
"Modified Tree Structure for Location Management in Mobile Environments,"
Proc. 14th Annual Joint Conference of IEEE Computer and Communication
Societies (INFOCOM),
pp. 530-537, April 1995.
Back to beginning
Metamorphic Robots:
The robotics community has been exploring how to
get a collection of homogeneous interlocking robots to change
their global shape. Much previous work focused on centralized algorithms
for the problem. We found some simple distributed algorithms for
some simple cases. We hope to use them as building blocks to achieve
more complex reconfigurations.
- Walter, J., Welch, J. L., and Amato, N. A.,
"Distributed Reconfiguration of Hexagonal Robots in Two Dimensions,"
in Sensor Fusion and
Decentralized Control in Robotic Systems III, Gerard T. McKee, Paul S.
Schenker, Editors, Proceedings of SPIE Vol. 4196, pp. 441-453,
November 2000.
abstract,
ps,
pdf.
Full version submitted for journal publication:
ps,
pdf.
- Walter, J., Welch, J. L., and Amato, N. A.,
"Distributed Reconfiguration of Metamorphic Robot Chains,"
Proc. 19th ACM Symposium on Principles of Distributed Computing (PODC),
pp. 171-180, July 2000.
abstract,
ps,
pdf.
Back to beginning
Randomized Distributed Data Structures:
Randomized algorithms can be simpler and more efficient than
deterministic algorithms for the same problem and can even allow the
existence of solutions to some problems for which there are no
deterministic algorithms. The drawback is that absolute correctness
cannot be guaranteed in all situations; typically a randomized
algorithm has some, preferably minuscule, probability of either not
terminating or returning an incorrect answer. We are interested in
developing useful and rigorous specifications of distributed data
structures that have probabilistic behavior; finding efficient
distributed algorithms to implement the shared objects so specified;
and identifying classes of distributed applications that can tolerate
probabilistic data structures.
- Lee, H. and Welch, J. L.,
"Randomized Shared Queues Applied to Distributed Optimization Algorithms,"
to appear,
Proc. 12th International Symposium on Algorithms and Computation
(ISAAC),
December 2001.
abstract,
ps,
pdf.
An earlier version appears as:
Lee, H. and Welch, J. L.,
"Brief Announcement: Randomized Shared Queues",
Proc. 20th ACM Symposium on Principles of Distributed Computing (PODC),
August 2001.
- Lee, H. and Welch, J. L.,
"Applications of Probabilistic Quorums to Iterative Algorithms,"
Proc. 21st IEEE International Conference on Distributed Computing Systems
(ICDCS),
pp. 21-30, April 2001. Nominated for best paper award.
abstract,
ps,
pdf.
An earlier version appeared as:
Lee, H. and Welch, J. L.,
"Brief Announcement: Specification, Implementation and Application
of Randomized Regular Registers,"
Proc. 19th ACM Symposium on Principles of Distributed Computing (PODC),
p. 338, July 2000.
Back to beginning