Consider a common built-in programming type such as Integer. How, for example, do you reason about the effect of code involving objects (variables) of type Integer? A hardware engineer might view the value of an Integer object as a boolean vector, and the high-level-language operators "+" and "-" as macros that stand for hardwired control sequences which manipulate boolean vectors. "Boolean vector" is an example of a mathematical model for the value of an Integer object, i.e., something that defines a mental image for the object's value and provides a machine-processable notation that supports formal reasoning about that object's behavior.
The boolean vector model for programming type Integer works well for the hardware designer who is implementing arithmetic circuits, but it is at best unnecessarily complex for the software engineer who is a client of that hardware. For a software engineering task, you normally view the value of an Integer object according to a more appropriate mathematical model: a mathematical integer. You also picture Integer operators such as "+" and "-" as performing additions and subtractions of mathematical integers. In other words, you don't think about Integer objects in terms of internal representations, but in terms of their representation-neutral (i.e., "abstract") mathematical models.
The burgeoning popularity of component technologies, from the early and influential Booch components [1] through such distributed object technology contenders [7] as CORBA, DCOM, and JavaBeans, makes it imperative that reasoning difficulties with component-based software be dealt with before they lead to a software disaster. Fortunately, software components present an opportunity along with the reasoning challenge. Every programming type gives you something to "wrap" with an appropriate mathematical model. In fact, researchers have already used this idea to tie formal mathematical models to some popular-technology components [6]. The models involved are more complex than simple mathematical integers. But they are far less complex than the underlying bits used in computer representations and the code that transforms them, which must remain the last resort for understanding program behavior.
Mathematical modeling also provides guidance when trying to identify and design new domain-specific software components. Textbooks on the subject usually stop short of detailed component designs. They assume that the domain-specific concepts identified by analysis, if named appropriately, will be intrinsically understandable to domain experts through intuitive or metaphorical models (e.g., "a stack is like a stack of cafeteria trays"). But in complex domains where system correctness is very important -- such as air-traffic control -- the precise behavioral details of software components must be so well-understood that specification by wishful naming and content-free explanations such as "a stack is a stack" cannot suffice. Moreover, the software objects in a system often do not correspond one-to-one with actual physical objects, making it impossible to explain the behavior of software objects by appealing to physical analogies. Furthermore, implementations of complex domain-specific components usually are layered over other complex components, making it practically impossible to understand their behavior by sifting through their implementation code in isolation. Finally, in many component technologies no source code is available for some or all components. Mathematical modeling is the only rational thing to do.
procedure Reverse (alters s: List) variable temp: Item begin if Length (s) > 0 then Remove (s, temp) Reverse (s) Insert (s, temp) end if end ReverseAssuming you understand informally that the intended behavior of Reverse is to "reverse" a List object, how do you reason soundly about whether this body actually accomplishes that? You need to know exactly what a List object is, exactly what each of the operations Length, Remove, and Insert does, and exactly what Reverse is supposed to do. Mathematical modeling seems like an obvious approach.
But is this answer really so obvious? To see how such a question might be answered in traditional documentation for clients of a "List" component, we examined several descriptions of off-the-shelf components involving "List" and "Insert". We found a wide range of explanations ranging from the content-free to the cryptic to the implementation-dependent to the nearly acceptable (i.e., the best we could find). Here, quoted directly but without attribution, are a few of the explanations we found for the behavior of an Insert operation for a List object:
Without an explicit mathematical model that abstractly specifies the state of a List object and the behavior of List operations, reasoning about List objects is reduced to speculation and guesswork. How should objects and their operations be explained, given that a basic objective of software engineering is to be able to reason about and understand the software? The rest of this article illustrates an answer to this fundamental question using the List example. The issue at hand is one that you must address no matter which programming language or paradigm you use. But it is especially important for component-based software development, where source code for the components used often is not available to the client programmer.
Figure 1 -- A typical singly linked list representation
What is the essence of the information captured in this data structure, independent of its representation? We claim that it is simply the string of items already visited, namely <3,4,5>, and the string of items yet to be visited, namely <1,4>. That is, you can view the value of a List object as an ordered pair of mathematical strings of items.
As the Integer-as-boolean-vector example suggests, mathematical modeling does not by itself guarantee understandable specifications or ease of reasoning. Choosing a good mathematical model is a crucial but sometimes difficult task. For example, you might choose to think of the value of a List object as a single string of items (e.g., <3,4,5,1,4>) along with an integer current position (e.g., 3); as a function from integer positions to items along with a current position; or even as a complex mathematical structure that captures the links and nodes of the above representation. Selection of a good mathematical model depends heavily on the operations to be specified, the choice of which should be guided by considerations of observability, controllability, and performance-influenced pragmatism [2, 11]. The pair-of-strings model suggested above leads (in our opinion) to the most understandable specification of the concept and makes it easy to reason about programs that use List objects, as we will see.
Figure 2 shows the specification of a List component in a dialect of the RESOLVE language [10]. List_Template is a generic concept (specification template) which is parameterized by the programming type of items in the lists. As just stated, each List object is modeled by an ordered pair of mathematical strings of items. The operator "*" denotes string concatenation; "<x>" denotes the string consisting of the single item x; and "|s|" denotes the length of string s.
concept List_Template (type Item) type List is modeled by (left: string of Item, right: string of Item) exemplar s initialization ensures |s.left| = 0 and |s.right| = 0 operation Insert ( alters s: List consumes x: Item ) ensures s.left = #s.left and s.right = <#x> * #s.right operation Remove ( alters s: List produces x: Item ) requires |s.right| > 0 ensures s.left = #s.left and #s.right = <x> * s.right operation Advance ( alters s: List ) requires |s.right| > 0 ensures s.left * s.right = #s.left * #s.right and |s.left| = |#s.left| + 1 operation Reset ( alters s: List ) ensures |s.left| = 0 and s.right = #s.left * #s.right operation Advance_To_End( alters s: List ) ensures |s.right| = 0 and s.left = #s.left * #s.right operation Left_Length ( preserves s: List ) returns length: Integer ensures length = |s.left| operation Right_Length ( preserves s: List ) returns length: Integer ensures length = |s.right| end List_Template
Conceptualizing a List object as a pair of strings makes it easy to explain the behavior of operations that insert or remove from the "middle". A sample value of a List_Of_Integers object, for example, is the ordered pair (<3,4,5>,<1,4>). Insertions and removals can be explained as taking place between the two strings, i.e., either at the right end of the left string or at the left end of the right string.
The declaration of programming type List introduces the mathematical model and says that a List object initially (i.e., upon declaration) is "empty": both its left and right strings are empty strings. Each operation is specified by a requires clause (precondition), which is an obligation for the caller; and an ensures clause (postcondition), which is a guarantee from a correct implementation. In the postcondition of Insert, for example, #s and #x denote the incoming values of s and x, respectively, and s and x denote the outgoing values. Insert has no precondition, and it ensures that the incoming value of x is concatenated onto the left end of the right string of the incoming value of s; the left string is not affected. Notice that the postcondition describes how the operation alters the value of s, but the return value of parameter x (which has the mode consumes) remains otherwise unspecified; consumes means it gets an initial value for its type. For example, an Integer object has an initial value of 0.
RESOLVE specifications use a combination of standard mathematical models such as integers, sets, functions, and relations, in addition to tuples and strings. The explicit introduction of mathematical models allows the use of standard notations associated with those models in explaining the operations. Our experience is that this notation, while precise and formal, is nonetheless fairly easy to learn, even for beginning computer science students.
We leave to the reader the task of understanding the other List_Template operations. List_Template is merely an example chosen to illustrate the features of explicit mathematical modeling as a specification approach. Other RESOLVE components include general-purpose ones defining queues, stacks, bags, partial maps, sorting machines, solvers for graph optimization problems, etc.; and more complex domain-specific components.
operation Reverse (alters s: List) requires |s.left| = 0 ensures s.left = reverse (#s.right) and |s.right| = 0The only new notation here is reverse, a built-in mathematical function in the specification notation. Formally, its meaning is:
reverse (empty_string) = empty_string reverse (a * <x>) = <x> * reverse (a)Informally, its meaning is that, if s is a string (e.g., <1,2,3>), then reverse(s) is the string whose items are the same as those in s but in the opposite order (e.g., <3,2,1>).
Let's reconsider the reasoning question raised earlier (where Length has been replaced in the code with Right_Length to match exactly the component interface defined in Figure 2). Is the following implementation correct for the above specification of Reverse?
procedure Reverse (alters s: List) variable temp: Item begin if Right_Length (s) > 0 then Remove (s, temp) Reverse (s) Insert (s, temp) end if end ReverseYou can reason about the correctness of this code with varying degrees of confidence through testing (automated execution on sample inputs), tracing (manual execution on sample inputs), and/or formal symbolic reasoning (manual or automated proof of correctness). But all of these must be based on mathematical modeling of Lists. Although testing is clearly important, here we illustrate only the last two approaches to show the power of mathematical modeling for human reasoning about program behavior.
|
|
|
|
|
s = (<>, <3, 4, 6, 2>) and temp = 0 |
if Right_Length (s) > 0 then |
|
|
|
s = (<>, <3, 4, 6, 2>) and temp = 0 |
Remove (s, temp) |
|
|
|
s = (<>, <4, 6, 2>) and temp = 3 |
Reverse (s) |
|
|
|
s = (<2, 6, 4>, <>) and temp = 3 |
Insert (s, x) |
|
|
|
s = (<2, 6, 4>, <3>) and temp = 0 |
end if |
|
|
|
s = (<2, 6, 4>, <3>) and temp = 0 |
Table 1 -- A tracing table for Reverse
There are two states in Table 1 where the recording of facts calls for some explanation. The facts at state 2 are based on the postcondition of the Remove operation. However, you can assume the postcondition of Remove only if the precondition of Remove is satisfied before the call, i.e., in state 1. In this case, object values at state 1 can be seen by inspection to satisfy the precondition of Remove, so appealing to the postcondition of Remove to characterize state 2 represents valid reasoning. Also, the facts at state 3 use the postcondition of Reverse. Assuming the postcondition of Reverse when tracing Reverse would represent circular, invalid reasoning without first verifying that the recursion is "making progress". In this case, progress is evident because the length of s.right, at state 2, is less than the length of s.right at state 0. Again you can see this by inspection, and the justification for appealing to the postcondition of Reverse in state 3 is mathematical induction. (Note also that the precondition of Reverse holds at state 2.)
Details of the remaining entries of the table are straightforward. Examination of the facts at state 5 reveals whether this implementation of Reverse is correct for the specific input value s= (< >, <3,4,6,2>). You should be able to see from this trace and the specification that it is not correct.
Our approach to symbolic reasoning is called natural reasoning, a technique proposed by Heym [4], who also proved conditions for its soundness and relative completeness. The method is called natural reasoning, like natural deduction in mathematics, because it is an operationally-based approach that is intuitively appealing to computer science students and experienced software engineers alike. It lets you formally represent the informal reasoning used by the author of the code, effectively encoding why he/she thinks the code "works".
Natural reasoning can be viewed as a two-step process in which you:
Figure 3 -- Relationships in symbolic reasoning
In addition to those arising from the procedure body statements, step 1 produces two special assertions. One is a fact (an assertion to be assumed in step 2 of natural reasoning): the precondition of Foo holds in state 0, i.e., pre[x0,y0]. Another is an obligation (an assertion to be proved in step 2): the postcondition of Foo holds in state 4 with respect to state 0, i.e., post[x0,y0,x4,y4]. Intuitively, this says that if you view the effect of the operation from the client program, as control appears to jump directly from state 0 to state 4, the net effect of the individual statements in the body is consistent with the specification.
Step 2 of natural reasoning involves combining the assertions recorded in step 1 to show that all the obligations can be proved from the available facts. This task is generally an intellectually challenging activity in which computer-based theorem proving helps, but, given the current state-of-the-art, it is far from entirely automatic.
The assertions recorded in step 1 arise from three questions about every state:
|
|
|
|
|
|
|
|s0.left| = 0 and is_initial (temp0) |
||
if Right_Length (s) > 0 then |
|||
|
|
|s0.right| > 0 |
s1 = s0 and temp1 = temp0 |
|s1.right| > 0 |
Remove (s, temp) |
|||
|
|
|s0.right| > 0 |
s2.left = s1.left and s1.right = <temp2> * s2.right |
|s2.left| = 0 and |s2.right| < |s0.right| |
Reverse (s) |
|||
|
|
|s0.right| > 0 |
s3.left = reverse (s2.right) and |s3.right| = 0 and temp3 = temp2 |
|
Insert (s, x) |
|||
|
|
|s0.right| > 0 |
s4.left = s3.left and s4.right = <temp3> * s3.right and is_initial (temp4) |
|
end if |
|||
|
|
|s0.right| = 0 |
s5 = s0 and temp5 = temp0 |
s5.left = reverse (s0.right) and |s5.right| = 0 |
|
|
|s0.right| > 0 |
s5 = s4 and temp5 = temp4 |
s5.left = reverse (s0.right) and |s5.right| = 0 |
Table 2 -- A symbolic reasoning table for Reverse
In Table 2, si.left and si.right are the symbolic denotations of values for object s in state i; similarly for object temp. The facts at state 0 are obtained by substituting the symbolic value of object s at state 0, namely s0, into the precondition of Reverse, and by recording initial values for all local objects. The obligation at state 5 is obtained by substituting the symbolic values of s at state 0 and at state 5 into the postcondition of Reverse. This is the goal obligation -- once it is proved, the correctness of Reverse is established. Notice how the path condition |s0.right| > 0 for states 1-4 records when these states are reached. Facts recorded for states 1-5 are based on the postconditions of operations and on the flow of control for an if statement. Obligations arise in state 1, because of the precondition of Remove, and in state 2, because of the precondition of Reverse and because Reverse is being called recursively. Natural reasoning includes a built-in induction argument here so recursion is nothing special, except that before a recursive call there is an obligation to show termination: the recursive operation's progress metric has decreased, i.e., |s2.right| < |s0.right|.
Once all these assertions are recorded, you solve the reasoning problem by composing them appropriately to form the verification conditions and then showing that each of these conditions is satisfied. There is one verification condition for each obligation, of the form:
(true implies (|s0.left| = 0 and is_initial (temp0)) and (|s0.right| > 0 implies (s1 = s0 and temp1 = temp0)) and |s0.right| > 0The first two lines above are the assumptions of the first form for states 0 and 1, respectively, and the last line is the assumption of the second form for state 1.
The proof of the obligation in state 1 is easy for humans who have had a bit of practice with such things. Assuming that |s0.right| > 0, you conclude from the second line that s1 = s0 and, therefore, s1.right = s0.right. Then since |s0.right| > 0 you conclude by substitution |s1.right| > 0, i.e., the assertion to be proved. In a similar manner, you can easily prove the obligation at state 2.
Is Reverse correct? Table 1 shows a counterexample to any claim of correctness, and indeed the obligation at state 5 cannot be proved from the allowed assumptions. If the code were correct, however, tracing could not show this whereas symbolic reasoning could. Fixing the program is left as an exercise for the reader, as we would leave it for our students.
Perhaps this situation is tolerable if software components are to be used only for prototyping and non-safety-critical applications. But for "industrial strength" software systems where there can be serious consequences to software failures, the ability to reason soundly about software behavior is undeniably critical. The implications of unsound reasoning for productivity and quality -- the very attributes component-based software is supposed to improve -- are ominous. Fortunately, introductory CS students can learn to read and use specifications based on mathematical modeling and appreciate the significance of appropriate modeling in developing correct software. With open minds, a bit of continuing education, and tool support, software professionals also should be able to understand and appreciate this important technique.
[2] Fleming, D. Foundations of Object-Based Specification Design. Ph.D. diss., Dept. Comp. Sci. and Elec. Eng., West Virginia University, Morgantown, WV, 1997.
[3] Freedman, D.P., and Weinberg, G.M. Handbook of Walkthroughs, Inspections, and Technical Reviews: Evaluating Programs, Projects, and Products, 3rd ed. Dorset House, New York, 1990.
[4] Heym, W.D. Computer Program Verification: Improvements for Human Reasoning. Ph.D. diss., Dept. of Comp. and Inf. Sci., The Ohio State Univ., Columbus, OH, 1995.
[5] Knuth, D. Interviewed by D. Andrews, Byte (Sep. 1996); also available from http://www.byte.com/art/9609/sec3/art19.htm.
[6] Leavens, G.T., and Cheon, Y. Extending CORBA IDL to specify behavior with Larch. In OOPSLA '93 Workshop Proc.: Specification of Behavioral Semantics in OO Info. Modeling, pp. 77-80; also TR #93-20, Dept. of Comp. Sci., Iowa State Univ., Ames, IA, 1993.
[7] Orfali, R., Harkey, D., and Edwards, J. The Essential Distributed Objects Survival Guide. J. Wiley, New York, 1996.
[8] Owre, S., Rushby, J., Shankar, N., von Henke, F. Formal verification of fault-tolerant architectures: prolegomena to the design of PVS. IEEE Trans. on Soft. Eng. 21, 2 (Feb. 1995), 107-125.
[9] Sitaraman, M. An Introduction to Software Engineering Using Properly Conceptualized Objects. WVU Publications, Morgantown, WV, 1997.
[10] Sitaraman, M., and Weide, B.W., eds. Component-based software using RESOLVE. ACM Software Eng. Notes 19, 4 (1994), 21-67.
[11] Weide, B. W., Edwards, S. H., Heym, W. D., Long, T. J., and Ogden, W.F. Characterizing observability and controllability of software components. In Proc. 4th Intl. Conf. on Software Reuse, IEEE CS Press, 1996, pp. 62-71.
[12] Weide, B.W. Software Component Engineering. OSU Reprographics, Columbus, OH, 1997.