Fundamentals of Data Structures - Ellis Horowitz, Sartaj haiwingbasoftdif.ga - Ebook download as PDF File .pdf), Text File .txt) or read book online. data structure. Fundamentals of Data Structures - Ellis Horowitz & Sartaj Sahni - Ebook download as PDF File .pdf), Text File .txt) or read book online. Fundamentals: APPENDIX A: SPARKS .. DATA REPRESENTATIONS FOR STRINGS PATTERN.

Author: | MELISSA HARTSE |

Language: | English, Spanish, Hindi |

Country: | Guyana |

Genre: | Academic & Education |

Pages: | 580 |

Published (Last): | 26.10.2015 |

ISBN: | 215-9-58904-864-5 |

Distribution: | Free* [*Register to download] |

Uploaded by: | ELLENA |

the captain standing on the bridge, could press a button and-presto! to live with ' day-tight compartments' as the most Fundamentals of Data Structures – Ellis. Branch: master. interview/Data Structures and Algorithm/Algorithm Books/ Fundamentals of Computer Algorithm by Horowitz and haiwingbasoftdif.ga Find file Copy path. PDF | On Jan 1, , Ellis Horowitz and others published Fundamentals of Data Structure in C++ Sartaj Sahni at University of Florida.

The form in which we choose to write the axioms is important. Our goal here is to write the axioms in a representation independent way. Then, we discuss ways of implementing the functions using a conventional programming language. An implementation of a data structure d is a mapping from d to a set of other data structures e. This mapping specifies how every object of d is to be represented by the objects of e. Secondly, it requires that every function of d must be written using the functions of the implementing data structures e. Thus we say that integers are represented by bit strings, boolean is represented by zero and one, an array is represented by a set of consecutive words in memory.

Below is one complete version. This, combined with the above assertion implies that x is not present. Unfortunately a complete proof takes us beyond our scope but for those who wish to pursue program proving they should consult our references at the end of this chapter. Recursion We have tried to emphasize the need to structure a program to make it easier to achieve the goals of readability and correctness.

Actually one of the most useful syntactical features for accomplishing this is the procedure. Given a set of instructions which perform a logical operation, perhaps a very complex and long operation, they can be grouped together as a procedure.

Given the input-output specifications of a procedure, we don't even have to know how the task is accomplished, only that it is available.

This view of the procedure implies that it is invoked, executed and returns control to the appropriate place in the calling procedure. What this fails to stress is the fact that procedures may call themselves direct recursion before they are done or they may call other procedures which again invoke the calling procedure indirect recursion. These recursive mechanisms are extremely powerful, but even more importantly, many times they can express an otherwise complex process very clearly.

For these reasons we introduce recursion here. Most students of computer science view recursion as a somewhat mystical technique which only is useful for some very special class of problems such as computing factorials or Ackermann's function. This is unfortunate because any program that can be written using assignment, the if-then-else statement and the while statement can also be written using assignment, if-then-else and recursion.

Of course, this does not say that the resulting program will necessarily be easier to understand. However, there are many instances when this will be the case. When is recursion an appropriate mechanism for algorithm exposition? One instance is when the problem itself is recursively defined.

Given a set of n 1 elements the problem is to print all possible permutations of this set. It is easy to see that given n elements there are n! A simple algorithm can be achieved by looking at the case of four elements a,b,c,d. The answer is obtained by printing i a followed by all permutations of b,c,d ii b followed by all permutations of a,c,d iii c followed by all permutations of b,a,d iv d followed by all permutations of b,c,a The expression "followed by all permutations" is the clue to recursion.

It implies that we can solve the problem for a set with n elements if we had an algorithm which worked on n - 1 elements. A is a character string e. Then try to do one or more of the exercises at the end of this chapter which ask for recursive procedures.

We will see several important examples of such structures, especially lists in section 4. Another instance when recursion is invaluable is when we want to describe a backtracking procedure. But for now we will content ourselves with examining some simple, iterative programs and show how to eliminate the iteration statements and replace them by recursion. This may sound strange, but the objective is not to show that the result is simpler to understand nor more efficient to execute.

The main purpose is to make one more familiar with the execution of a recursive procedure. Suppose we start with the sorting algorithm presented in this section. To rewrite it recursively the first thing we do is to remove the for loops and express the algorithm using assignment, if-then-else and the go-to statement. Every place where a ''go to label'' appears, we replace that statement by a call of the procedure associated with that label.

This gives us the following set of three procedures. Procedure MAXL2 is also directly reculsive. These two procedures use eleven lines while the original iterative version was expressed in nine lines; not much of a difference. Notice how in MAXL2 the fourth parameter k is being changed. The effect of increasing k by one and restarting the procedure has essentially the same effect as the for loop. Now let us trace the action of these procedures as they sort a set of five integers When a procedure is invoked an implicit branch to its beginning is made.

The parameter mechanism of the procedure is a form of assignment. In section 4. Also in that section are several recursive procedures, followed in some cases by their iterative equivalents. Rules are also given there for eliminating recursion. There are many criteria upon which we can judge a program, for instance: i Does it do what we want it to do?

The above criteria are all vitally important when it comes to writing software, most especially for large systems. Though we will not be discussing how to reach these goals, we will try to achieve them throughout this book with the programs we write. Hopefully this more subtle approach will gradually infect your own program writing habits so that you will automatically strive to achieve these goals.

There are other criteria for judging programs which have a more direct relationship to performance. These have to do with computing time and storage requirements of the algorithms. Performance evaluation can be loosely divided into 2 major phases: a a priori estimates and b a posteriori testing. Both of these are equally important. First consider a priori estimation. We would like to determine two numbers for this statement.

The first is the amount of time a single execution will take; the second is the number of times it is executed. The product of these numbers will be the total time taken by this statement. One of the hardest tasks in estimating frequency counts is to choose adequate samples of data.

It is impossible to determine exactly how much time it takes to execute any command unless we have the following information: i the machine we are executing on: ii its machine language instruction set; iii the time required by each machine instruction; iv the translation a compiler will make from the source to the machine language.

It is possible to determine these figures by choosing a real machine and an existing compiler. Another approach would be to define a hypothetical machine with imaginary execution times , but make the times reasonably close to those of existing hardware so that resulting figures would be representative.

Neither of these alternatives seems attractive. In both cases the exact times we would determine would not apply to many machines or to any machine. Also, there would be the problem of the compiler, which could vary from machine to machine. Moreover, it is often difficult to get reliable timing figures because of clock limitations and a multi-programming or time sharing environment. Finally, the difficulty of learning another machine language outweighs the advantage of finding "exact" fictitious times.

All these considerations lead us to limit our goals for an a priori analysis. Instead, we will concentrate on developing only the frequency count for all statements. The anomalies of machine configuration and language will be lumped together when we do our experimental studies. Parallelism will not be considered. Consider the three examples of Figure 1. Then its frequency count is one.

In program b the same statement will be executed n times and in program c n2 times assuming n 1. In our analysis of execution we will be concerned chiefly with determining the order of magnitude of an algorithm. This means determining those statements which may have the greatest frequency count. To determine the order of magnitude, formulas such as often occur.

In the program segment of figure 1. The Fibonacci sequence starts as 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, Each new term is obtained by taking the sum of the two previous terms. The program on the following page takes any non-negative integer n and prints the value Fn.

Below is a table which summarizes the frequency counts for the first three cases. None of them exercises the program very much. Notice, though, how each if statement has two parts: the if condition and the then clause.

These may have different execution counts. At this point the for loop will actually be entered. Steps 1, 2, 3, 5, 7 and 9 will be executed once, but steps 4, 6 and 8 not at all. Both commands in step 9 are executed once.

If a correct proof can be obtained. The larger the software. Testing is the art of creating sample data upon which to run your program. One such tool instruments your source code and then tells you for every data set: This shifts our emphasis away from the management and integration of the file: If you note them down with the code. This is another use of program proving. For each subsequent compiler their estimates became closer to the truth. In fact you may save as much debugging time later on by doing a new version now.

Proofs about programs are really no different from any other kinds of proofs. One thing you have forgotten to do is to document. If you do decide to scrap your work and begin again. The proof can't be completed until these are changed. If the program fails to respond correctly then debugging is needed to determine what went wrong and how to correct it.

Unwarrented optimism is a familiar disease in computing. This is referred to as the bottom-up approach.. Experience suggests that the top-down approach should be followed when creating a program. This method of design is called the top-down approach. One solution is to store the values in an array in such a way that the i-th integer is stored in the i-th array position. One of the simplest solutions is given by the following "from those integers which remain unsorted.

At this level the formulation is said to be abstract because it contains no details regarding how the objects will be represented and manipulated in a computer. Suppose we devise a program for sorting a set of n 1 distinct integers. There now remain two clearly defined subtasks: Each subtask is similarly decomposed until all tasks are expressed within a programming language.

Let us examine two examples of top-down program development. The initial solution may be expressed in English or some form of mathematical notation. Underlying all of these strategies is the assumption that a language exists for adequately describing the processing of data at several abstract levels.

The design process consists essentially of taking a proposed solution and successively refining it until an executable program is achieved. A look ahead to problems which may arise later is often useful. This latter problem can be solved by the code file: We are now ready to give a second refinement of the solution: If possible the designer attempts to partition the solution into logical subtasks. We observe at this point that the upper limit of the for-loop in line 1 can be changed to n.

A j t The first subtask can be solved by assuming the minimum is A i.. From the standpoint of readability we can ask if this program is good.. We first note that for any i. Eventually A n is compared to the current minimum and we are done.. Is there a more concise way of describing this algorithm which will still be as easy to comprehend? Substituting while statements for the for loops doesn't significantly change anything..

There are three possibilities. By making use of the fact that the set is sorted we conceive of the following efficient method: Let us develop another program.. We assume that we have n 1 distinct integers which are already sorted and stored in the array A 1: Continue in this way by keeping two pointers.

Part of the freedom comes from the initialization step. Below is one complete version. In fact there are at least six different binary search programs that can be produced which are all correct. Note how at each stage the number of elements in the remaining set is decreased by about one half.

This method is referred to as binary search. There are many more that we might produce which would be incorrect. For instance we could replace the while loop by a repeat-until statement with the same English condition. Whichever version we choose. Given a set of instructions which perform a logical operation. The procedure name and its parameters file: As we enter this loop and as long as x is not found the following holds: Actually one of the most useful syntactical features for accomplishing this is the procedure.

Unfortunately a complete proof takes us beyond our scope but for those who wish to pursue program proving they should consult our references at the end of this chapter. Recursion We have tried to emphasize the need to structure a program to make it easier to achieve the goals of readability and correctness. This view of the procedure implies that it is invoked. These recursive mechanisms are extremely powerful. For these reasons we introduce recursion here. This is unfortunate because any program that can be written using assignment..

Most students of computer science view recursion as a somewhat mystical technique which only is useful for some very special class of problems such as computing factorials or Ackermann's function. What this fails to stress is the fact that procedures may call themselves direct recursion before they are done or they may call other procedures which again invoke the calling procedure indirect recursion. Given the input-output specifications of a procedure. Factorial fits this category.

Of course.. When is recursion an appropriate mechanism for algorithm exposition? One instance is when the problem itself is recursively defined.

It implies that we can solve the problem for a set with n elements if we had an algorithm which worked on n. The answer is obtained by printing i a followed by all permutations of b. It is easy to see that given n elements there are n! A is a character string e.

Then try to do one or more of the exercises at the end of this chapter which ask for recursive procedures.. Given a set of n 1 elements the problem is to print all possible permutations of this set. A simple algorithm can be achieved by looking at the case of four elements a.

Another instance when recursion is invaluable is when we want to describe a backtracking procedure. Suppose we start with the sorting algorithm presented in this section.

The main purpose is to make one more familiar with the execution of a recursive procedure. We will see several important examples of such structures. This may sound strange.

To rewrite it recursively the first thing we do is to remove the for loops and express the algorithm using assignment. This gives us the following set of three procedures. Every place where a ''go to label'' appears. But for now we will content ourselves with examining some simple. The effect of increasing k by one and restarting the procedure has essentially the same effect as the for loop. Procedure MAXL2 is also directly reculsive. Notice how in MAXL2 the fourth parameter k is being changed.

Thus a recursive call of a file: These two procedures use eleven lines while the original iterative version was expressed in nine lines. Now let us trace the action of these procedures as they sort a set of five integers When a procedure is invoked an implicit branch to its beginning is made.

These have to do with computing time and storage requirements of the algorithms. Performance evaluation can be loosely divided into 2 major phases: Though we will not be discussing how to reach these goals. The second statistic is called the frequency count. We would like to determine two numbers for this statement.

There are many criteria upon which we can judge a program. First consider a priori estimation. Rules are also given there for eliminating recursion. Also in that section are several recursive procedures. The parameter mechanism of the procedure is a form of assignment. The first is the amount of time a single execution will take. The product of these numbers will be the total time taken by this statement.

There are other criteria for judging programs which have a more direct relationship to performance. Hopefully this more subtle approach will gradually infect your own program writing habits so that you will automatically strive to achieve these goals. In section 4. Both of these are equally important. The above criteria are all vitally important when it comes to writing software.

It is impossible to determine exactly how much time it takes to execute any command unless we have the following information: Parallelism will not be considered. One of the hardest tasks in estimating frequency counts is to choose adequate samples of data.

All these considerations lead us to limit our goals for an a priori analysis. It is possible to determine these figures by choosing a real machine and an existing compiler.

The anomalies of machine configuration and language will be lumped together when we do our experimental studies. Neither of these alternatives seems attractive. Consider the three examples of Figure 1. Another approach would be to define a hypothetical machine with imaginary execution times.

In both cases the exact times we would determine would not apply to many machines or to any machine.. Each new term is obtained by taking the sum of the two previous terms. Then its frequency count is one.. In the program segment of figure 1.

In program b the same statement will be executed n times and in program c n2 times assuming n 1. The program on the following page takes any non-negative integer n and prints the value Fn. In our analysis of execution we will be concerned chiefly with determining the order of magnitude of an algorithm. Three simple programs for frequency counting.

Now 1. To determine the order of magnitude. This means determining those statements which may have the greatest frequency count. The Fibonacci sequence starts as 0. In general To clarify some of these ideas. Below is a table which summarizes the frequency counts for the first three cases.

A complete set would include four cases: None of them exercises the program very much. Both commands in step 9 are executed once. We can summarize all of this with a table.

At this point the for loop will actually be entered. These may have different execution counts. Step Frequency Step Frequency 1 1 9 2 2 1 10 n 3 1 11 n-1 4 0 12 n-1 5 1 13 n-1 6 0 14 n-1 7 1 15 1 file: Though 2 to n is only n. Steps 1. Execution Count for Computing Fn Each statement is counted once.

O n2 is called quadratic. If we have two algorithms which perform the same task. The reason for this is that as n increases the time for the second algorithm will get far worse than the time for the first. When we say that the computing time of an algorithm is O g n we mean that its execution takes no more than a constant times g n. If an algorithm takes time O log n it is faster. For example n might be the number of inputs or the number of outputs or their sum or the magnitude of one of them.

O n is called linear. O log n. For example.

We will often write this as O n. O n3 is called cubic. The for statement is really a combination of several statements. We write O 1 to mean a computing time which is a constant. These seven computing times. This notation means that the order of magnitude is proportional to n.

O n log n is better than O n2 but not as good as O n. Given an algorithm. For small data sets. Using big-oh notation. We will see cases of this in subsequent chapters. Then a performance profile can be gathered using real time calculation. For exponential algorithms. Another valid performance measure of an algorithm is the space it requires. Often one can trade space for time. For large data sets. Figures 1. On the other hand. This shows why we choose the algorithm with the smaller order of magnitude.

Notice how the times O n and O n log n grow much more slowly than the others.. An algorithm which is exponential will work only for very small inputs.

In practice these constants depend on many factors. When n is odd H.. A magic square is an n x n matrix of the integers 1 to n2 such that the sum of every row.

Coxeter has given a simple rule for generating a magic square: The statement i. The file: It emphasizes that the variables are thought of as pairs and are changed as a unit. ACM Computing Surveys. For a discussion of tools and procedures for developing very large software systems see Practical Strategies for Developing Large Software Systems. The Elements of Programming Style by B.

Since there are n2 positions in which the algorithm must place a number. For a discussion of the more abstract formulation of data structures see "Toward an understanding of data structures" by J.

Thus each statement within the while loop will be executed no more than n2. The magic square is represented using a two dimensional array having n rows and n column.

For this application it is convenient to number the rows and columns from zero to n. For a discussion of good programming techniques see Structured Programming by O. Academic Press. Kernighan and P. For a further discussion of program proving see file: Fundamental Algorithms. Special Issue: The while loop is governed by the variable key which is an integer variable initialized to 2 and increased by one each time through the loop.

Both do not satisfy one of the five criteria of an algorithm. Describe the flowchart in figure 1. Can you think of a clever meaning for S. Concentrate on the letter K first.

American Mathematical Society. Discuss how you would actually represent the list of name and telephone number pairs in a real machine. Consider the two statements: Which criteria do they violate?

Look up the word algorithm or its older form algorism in the dictionary. Can you do this without using the go to? Now make it into an algorithm. How would you handle people with the same last name. Determine how many times each statement is executed. Determine when the second becomes larger than the first..

If x occurs. For instance. Given n boolean variables x1. Try writing this without using the go to statement. Implement these procedures using the array facility The rule is: String x is unchanged. What is the computing time of your method? Strings x and y remain unchanged. NOT X:: Prove by induction: Trace the action of the procedure below on the elements 2. Using the notation introduced at the end of section 1. List as many rules of style in programming that you can think of that you would be willing to follow yourself.

Represent your answer in the array ANS 1: Take any version of binary search. If S is a set of n elements the powerset of S is the set of all possible subsets of S. This function is studied because it grows very fast for small values of m and n. Write a recursive procedure for computing this function. Write a recursive procedure to compute powerset S.

Tower of Hanoi There are three towers and sixty four disks of different diameters placed on the first tower. Monks were reputedly supposed to move the disks from tower 1 to tower 3 obeying the rules: Write a recursive procedure for computing the binomial coefficient as defined in section 1. Analyze the time and space requirements of your algorithm.. Given n. Ackermann's function A m. The pigeon hole principle states that if a function f has n distinct inputs but less than n distinct outputs then there exists two inputs a.

Analyze the computing time of procedure SORT as given in section 1. Then write a nonrecursive algorithm for computing Ackermann's function. The disks are in order of decreasing diameter as one scans up the tower. Give an algorithm which finds the values a.

Write a recursive procedure which prints the sequence of moves which accomplish this task. Therefore it deserves a significant amount of attention.

It is true that arrays are almost always implemented by using consecutive memory. The array is often the only means for structuring data which is provided in a programming language.

This is unfortunate because it clearly reveals a common point of confusion. For each index which is defined. If one asks a group of programmers to define an array. In mathematical terms we call this a correspondence or a mapping. For arrays this means we are concerned with only two operations which retrieve and store values. Using our notation this object can be defined as: STORE is used to enter new index-value pairs. In section ARRAYS second axiom is read as "to retrieve the j-th item where x has already been stored at index i in A is equivalent to checking if i and j are equal and if so.

There are a variety of operations that are performed on these lists. Notice how the axioms are independent of any representation scheme. If we restrict the index values to be integers. These operations include: Ace or the floors of a building basement. If we consider an ordered list more abstractly. If we interpret the indices to be n-dimensional.

It is only operations v and vi which require real effort. It is not always necessary to be able to perform all of these operations. In the study of data structures we are interested in ways of representing ordered lists so that these operations can be carried out efficiently. By "symbolic. Let us jump right into a problem requiring ordered lists which we will solve by using one dimensional arrays.

This problem has become the classical example for motivating the use of list processing techniques which we will see in later chapters.

This we will refer to as a sequential mapping.. We can access the list element values in either direction by changing the subscript values in a controlled way The problem calls for building a set of subroutines which allow for the manipulation of symbolic polynomials. See exercise 24 for a set of axioms which uses these operations to abstractly define an ordered list..

Insertion and deletion using sequential allocation forces us to move some of the remaining elements so the sequential mapping is preserved in its proper form.. Perhaps the most common way to represent an ordered list is by an array where we associate the list element ai with the array index i. It is precisely this overhead which leads us to consider nonsequential mappings of ordered lists into arrays in Chapter 4.

This gives us the ability to retrieve or modify the values of random elements in the list in a constant amount of time. A complete specification of the data structure polynomial is now given. We will also need input and output routines and some suitable format for preparing polynomials as input.

When defining a data object one must decide what functions will be available. However this is not an appropriate definition for our purposes.. For a mathematician a polynomial is a sum of terms where each term has the form axe. The first step is to consider how to define polynomials as a computer structure.

MULT poly. Then we would write REM P. Notice the absense of any assumptions about the order of exponents.. Suppose we wish to remove from P those terms having exponent one. These assumptions are decisions of representation. COEF B. These axioms are valuable in that they describe the meaning of each operation concisely and without implying an implementation. Now we can make some representation decisions. Exponents should be unique and in decreasing order is a very reasonable first decision.

Note how trivial the addition and multiplication operations have become. EXP B. Now assuming a new function EXP poly exp which returns the leading exponent of poly. EXP B file: B REM B.

We have avoided the need to explicitly store the exponent of each term and instead we can deduce its value by knowing our position in the list and the degree. But are there any disadvantages to this representation? Hopefully you have already guessed the worst one. The case statement determines how the exponents are related and performs the proper action..

With these insights. EXP B: EXP A. EXP A end end insert any remaining terms in A or B into C The basic loop of this algorithm consists of merging the terms of the two polynomials. Since the tests within the case statement require two terms.

COEF A. EXP A.. This representation leads to very simple algorithms for addition and multiplication. But scheme 1 could be much more wasteful. The first entry is the number of nonzero terms.

It will require a vector of length In the worst case. As for storage. Then for each term there are two entries representing an exponent-coefficient pair. Is this method any better than the first scheme? In general.

Suppose we take the polynomial A x above and keep only its nonzero coefficients. Basic algorithms will need to be more complex because we must check each exponent before we handle its coefficient. If all of A's coefficients are nonzero. The assignments of lines 1 and 2 are made only once and hence contribute O 1 to the overall computing time. This is a practice you should adopt in your own coding. The procedure has parameters which are polynomial or array names.

The code is indented to reinforce readability and to reveal more clearly the scope of reserved words. Statement two is a shorthand way of writing r Notice how closely the actual program matches with the original design. Comments appear to the right delimited by double slashes. Three pointers p. Blocks of statements are grouped together using square brackets. The basic iteration step is governed by a while loop. It is natural to carry out this analysis in terms of m and n.

To make this problem more concrete. Returning to the abstract object--the ordered list--for a moment. These are defined by the recurrence relation file: A two dimensional array could be a poor way to represent these lists because we would have to declare it as A m.

This hypothetical user may have many polynomials he wants to compute and he may not know their sizes.. This worst case is achieved. Suppose in addition to PADD. Otherwise, they are either historically significant or develop the material in the text somewhat further. Many people have contributed their time and energy to improve this book.

For this we would like to thank them. We wish to thank Arvind [sic], T. Gonzalez, L. Landweber, J. Misra, and D. Wilczynski, who used the book in their own classes and gave us detailed reactions.

Thanks are also due to A. Agrawal, M. Cohen, A. Howells, R. Istre, D. Ledbetter, D. Musser and to our students in CS , CSci and who provided many insights. For administrative and secretarial help we thank M. Eul, G. Lum, J.

Matheson, S. Moody, K. Pendleton, and L. To the referees for their pungent yet favorable comments we thank S. Gerhart, T. Standish, and J. Finally, we would like to thank our institutions, the University of Southern California and the University of Minnesota, for encouraging in every way our efforts to produce this book.

Ellis Horowitz Sartaj Sahni Preface to the Ninth Printing We would like to acknowledge collectively all of the individuals who have sent us comments and corrections since the book first appeared. For this printing we have made many corrections and improvements.

October l file: One often quoted definition views computer science as the study of algorithms. This study encompasses four distinct areas: The goal is to study various forms of machine fabrication and organization so that algorithms can be effectively carried out.

At one end are the languages which are closest to the physical machine and at the other end are languages designed for sophisticated problem solving. One often distinguishes between two phases of this area: The first calls for methods for specifying the syntax and semantics of a language. The second requires a means for translation into a more basic set of commands. Abstract models of computers are devised so that these properties can be studied.

This was realized as far back as by Charles Babbage, the father of computers. An algorithm's behavior pattern or performance profile is measured in terms of the computing time and space that are consumed while the algorithm is processing. Questions such as the worst and average time and how often they occur are typical. We see that in this definition of computer science, "algorithm" is a fundamental notion. Thus it deserves a precise definition.

The dictionary's definition "any mechanical or recursive computational procedure" is not entirely satisfying since these terms are not basic enough.

An algorithm is a finite set of instructions which, if followed, accomplish a particular task. In addition every algorithm must satisfy the following criteria: It is not enough that each operation be definite as in iii , but it must also be feasible. In formal computer science, one distinguishes between an algorithm, and a program.

A program does not necessarily satisfy condition iv. One important example of such a program for a computer is its operating system which never terminates except for system crashes but continues in a wait loop until more jobs are entered.

In this book we will deal strictly with programs that always terminate. Hence, we will use these terms interchangeably. An algorithm can be described in many ways. A natural language such as English can be used but we must be very careful that the resulting instructions are definite condition iii.

An improvement over English is to couple its use with a graphical form of notation such as flowcharts. This form places each processing step in a "box" and uses arrows to indicate the next step.

Different shaped boxes stand for different kinds of operations. All this can be seen in figure 1. The point is that algorithms can be devised for many common activities. Have you studied the flowchart? Then you probably have realized that it isn't an algorithm at all! Which properties does it lack?

Returning to our earlier definition of computer science, we find it extremely unsatisfying as it gives us no insight as to why the computer is revolutionizing our society nor why it has made us re-examine certain basic assumptions about our own role in the universe.

While this may be an unrealistic demand on a definition even from a technical point of view it is unsatisfying. The definition places great emphasis on the concept of algorithm, but never mentions the word "data". If a computer is merely a means to an end, then the means may be an algorithm but the end is the transformation of data.

That is why we often hear a computer referred to as a data processing machine. Raw data is input and algorithms are used to transform it into refined data. So, instead of saying that computer science is the study of algorithms, alternatively, we might say that computer science is the study of data: Figure 1.

Flowchart for obtaining a Coca-Cola There is an intimate connection between the structuring of data, and the synthesis of algorithms. In fact, a data structure and an algorithm should be thought of as a unit, neither one making sense without the other. For instance, suppose we have a list of n pairs of names and phone numbers a1,b1 a2,b2 , This task is called searching.

Just how we would write such an algorithm critically depends upon how the names and phone numbers are stored or structured. One algorithm might just forge ahead and examine names, a1,a2,a3, This might be fine in Oshkosh, but in Los Angeles, with hundreds of thousands of names, it would not be practical. If, however, we knew that the data was structured so that the names were in alphabetical order, then we could do much better. We could make up a second list which told us for each letter in the alphabet, where the first name with that letter appeared.

For a name beginning with, say, S, we would avoid having to look at names beginning with other letters. So because of this new structure, a very different algorithm is possible.

Other ideas for algorithms become possible when we realize that we can organize the data as we wish. We will discuss many more searching strategies in Chapters 7 and 9. Therefore, computer science can be defined as the study of data, its representation and transformation by a digital computer. The goal of this book is to explore many different kinds of data objects.

For each object, we consider the class of operations to be performed and then the way to represent this object so that these operations may be efficiently carried out. This implies a mastery of two techniques: The pedagogical style we have chosen is to consider problems which have arisen often in computer applications. For each problem we will specify the data object or objects and what is to be accomplished.

After we have decided upon a representation of the objects, we will give a complete algorithm and analyze its computing time. After reading through several of these examples you should be confident enough to try one on your own.

There are several terms we need to define carefully before we proceed. These include data structure, data object, data type and data representation. These four terms have no standard meaning in computer science circles, and they are often used interchangeably. A data type is a term which refers to the kinds of data that variables may "hold" in a programming language.

With every programming language there is a set of built-in data types. This means that the language allows variables to name data of that type and. Some data types are easy to provide because they are already built into the computer's machine language instruction set. Integer and real arithmetic are examples of this. Other data types require considerably more effort to implement. In some languages, there are features which allow one to construct combinations of the built-in types.

However, it is not necessary to have such a mechanism. All of the data structures we will see here can be reasonably built within a conventional programming language. Data object is a term referring to a set of elements, say D. Thus, D may be finite or infinite and if D is very large we may need to devise special ways of representing its elements in our computer. The notion of a data structure as distinguished from a data object is that we want to describe not only the set of objects, but the way they are related.

Saying this another way, we want to describe the set of operations which may legally be applied to elements of the data object. This implies that we must specify the set of operations and show how they work. To be more precise lets examine a modest example. The following notation can be used: SUCC stands for successor. The rules on line 8 tell us exactly how the addition operation works. For example if we wanted to add two and three we would get the following sequence of expressions: In practice we use bit strings which is a data structure that is usually provided on our computers.

But however the ADD operation is implemented, it must obey these rules. Hopefully, this motivates the following definition. A data structure is a set of domains , a designated domain , a set of functions and a end. It is called abstract precisely because the axioms do not imply a form of representation.

Our goal here is to write the axioms in a representation independent way. Thus we say that integers are represented by bit strings. We might begin by considering using some existing language. But at the first stage a data structure should be designed so that we know what it does.

Another way of viewing the implementation of a data structure is that it is the process of refining an abstract data type until all of the operations are expressible in terms of directly executable functions. An implementation of a data structure d is a mapping from d to a set of other data structures e. Though some of these are more preferable than others. This division of tasks. This mapping specifies how every object of d is to be represented by the objects of e. In current parlance the triple is referred to as an abstract data type.

Furthermore it is not really necessary to write programs in a language for which a compiler exists. We would rather not have any individual rule us out simply because he did not know or. The triple denotes the data structure d and it will usually be abbreviated by writing In the previous example The set of axioms describes the semantics of the operations.

Thus we would have to make pretense to build up a capability which already exists. First of all. The form in which we choose to write the axioms is important. Instead we choose to use a language which is tailored to describing the algorithms we want to write. The way to assign values is by the assignment statement variable expression. Several cute ideas have been suggested.

In addition to the assignment statement. Several such statements can be combined on a single line if they are separated by a semi-colon.

Expressions can be either arithmetic.. In order to produce these values. In the boolean case there can be only one of two values.. Most importantly. The meaning of this statement is given by the flow charts: Though this is very interesting from a theoretical viewpoint.

S is as S1 before and the meaning is given by It is well known that all "proper" programs can be written using only the assignment. To accomplish iteration.. If S1 or S2 contains more than one statement.

Brackets must be used to show how each else corresponds to one if. So we will provide other statements such as a second iteration statement. On the contrary. This result was obtained by Bohm and Jacopini.

One of them is while cond do S end where cond is as before. Another iteration statement is loop S forever which has the meaning As it stands. One way of exiting such a loop is by using a go to label statement which transfers control to "label. A more restricted form of the go to is the command exit which will cause a transfer of control to the first statement after the innermost loop which contains it.

This looping statement may be a while.. The semantics is easily described by the file: We can write the meaning of this statement in SPARKS as vble fin incr start finish increment 0 do start to finish by increment do while vble.. It has the form where the Si. A variable or a constant is a simple form of an expression. A procedure may be invoked by using a call statement call NAME parameter list Procedures may call themselves. The else clause is optional. The execution of an end at the end of procedure implies a return.

Though recursion often carries with it a severe penalty at execution time. This may be somewhat restrictive in practice. All procedures are treated as external. The expr may be omitted in which case a return is made to the calling procedure. Parameters which are constants or values of expressions are stored into internally generated words whose addresses are then passed to the procedure.

Many such programs are easily translatable so that the recursion is removed and efficiency achieved. The association of actual to formal parameters will be handled using the call by reference rule. This penalty will not deter us from using recursion. This means that at run time the address of each parameter is passed to the called procedure.

This is a goal which should be aimed at by everyone who writes programs. These are often useful features and when available they should be used. We avoid the problem of defining a "format" statement as we will need only the simplest form of input and output.. See the book The Elements of Programming Style by Kernighan and Plauger for more examples of good rules of programming. The command stop halts execution of the currently executing procedure.

Comments may appear anywhere on a line enclosed by double slashes.. An n-dimensional array A with lower and upper bounds li.. We have avoided introducing the record or structure concept. The SPARKS language is rich enough so that one can create a good looking program by applying some simple rules of style.

Avoid sentences like ''i is increased by one. It is often at this point that one realizes that a much better program could have been built. But to improve requires that you apply some discipline to the process of creating programs.

Assume that these operations already exist in the form of procedures and write an algorithm which solves the problem according to the requirements. Designing an algorithm is a task which can be done independently of the programming language you eventually plan to use.

One of the criteria of a good design is file: Use a notation which is natural to the way you wish to describe the order of processing. If you have been careful about keeping track of your previous work it may not be too difficult to make changes. You should consider alternatives.

In fact. Can you think of another algorithm? If so. Make sure you understand the information you are given the input and what results you are to produce the output.

You are now ready to proceed to the design phase. This method uses the philosophy: It may already be possible to tell if one will be more desirable than the other. To understand this process better. The order in which you do this may be crucial.

Modern pedagogy suggests that all processing which is independent of the data representation be written out first. Perhaps you should have chosen the second design alternative or perhaps you have spoken to a friend who has done it better. By postponing the choice of how the data is stored we can try to isolate what operations depend upon the choice of data representation. For each object there will be some basic operations to perform on it such as print the maze.

Try to write down a rigorous description of the input and output which covers all cases. You must now choose representations for your data objects a maze as a two dimensional array of zeros and ones.

Finally you produce a complete version of your first program. If you can't distinguish between the two. You may have several data objects such as a maze. We hope your productivity will be greater. This happens to industrial programmers as well. The larger the software. The proof can't be completed until these are changed. The previous discussion applies to the construction of a single procedure as well as to the writing of a large software system.

Verification consists of three distinct aspects: Before executing your program you should attempt to prove it is correct.. Unwarrented optimism is a familiar disease in computing. This shifts our emphasis away from the management and integration of the file: One thing you have forgotten to do is to document. Different situations call for different decisions. One such tool instruments your source code and then tells you for every data set: As a minimal requirement.

Proofs about programs are really no different from any other kinds of proofs. B and C. If the program fails to respond correctly then debugging is needed to determine what went wrong and how to correct it.

Testing is the art of creating sample data upon which to run your program. For each subsequent compiler their estimates became closer to the truth. It is usually hard to decide whether to sacrifice this first attempt and begin again or just continue to get the first version working. But why bother to document until the program is entirely finished and correct? Because for each procedure you made some assumptions about its input and output. The graph in figure 1. In fact you may save as much debugging time later on by doing a new version now.

If you note them down with the code. Each of these is an art in itself. If you do decide to scrap your work and begin again. For each compiler there is the time they estimated it would take them and the time it actually took. Finally there may be tools available at your computing center to aid in the testing process. Let us concentrate for a while on the question of developing a single procedure which solves a specific task.

Many times during the proving process errors are discovered in the code. This is a phenomenon which has been observed in practice. If you have written more than a few procedures. This is another use of program proving. But prior experience is definitely helpful and the time to build the third compiler was less than one fifth that for the first one. One proof tells us more than any finite amount of testing.

If a correct proof can be obtained. This latter problem can be solved by the code file: Each subtask is similarly decomposed until all tasks are expressed within a programming language. This is referred to as the bottom-up approach. The design process consists essentially of taking a proposed solution and successively refining it until an executable program is achieved. The initial solution may be expressed in English or some form of mathematical notation.

Suppose we devise a program for sorting a set of n given by the following 1 distinct integers. A look ahead to problems which may arise later is often useful. This method of design is called the top-down approach. At this level the formulation is said to be abstract because it contains no details regarding how the objects will be represented and manipulated in a computer. If possible the designer attempts to partition the solution into logical subtasks. We are now ready to give a second refinement of the solution: One solution is to store the values in an array in such a way that the i-th integer is stored in the i-th array position.

One of the simplest solutions is "from those integers which remain unsorted. Underlying all of these strategies is the assumption that a language exists for adequately describing the processing of data at several abstract levels. Let us examine two examples of top-down program development. Experience suggests that the top-down approach should be followed when creating a program.

There now remain two clearly defined subtasks: Eventually A n is compared to the current minimum and we are done. We first note that for any i. We observe at this point that the upper limit of the for-loop in line 1 can be changed to n. A j t The first subtask can be solved by assuming the minimum is A i.

From the standpoint of readability we can ask if this program is good. Is there a more concise way of describing this algorithm which will still be as easy to comprehend? Substituting while statements for the for loops doesn't significantly change anything.. Let us develop another program. By making use of the fact that the set is sorted we conceive of the following efficient method: There are three possibilities.

Continue in this way by keeping two pointers. We assume that we have n 1 distinct integers which are already sorted and stored in the array A 1: Below is one complete version. This method is referred to as binary search. Note how at each stage the number of elements in the remaining set is decreased by about one half.

There are many more that we might produce which would be incorrect. In fact there are at least six different binary search programs that can be produced which are all correct. Whichever version we choose. For instance we could replace the while loop by a repeat-until statement with the same English condition.

Part of the freedom comes from the initialization step. Given a set of instructions which perform a logical operation.. Recursion We have tried to emphasize the need to structure a program to make it easier to achieve the goals of readability and correctness. As we enter this loop and as long as x is not found the following holds: Unfortunately a complete proof takes us beyond our scope but for those who wish to pursue program proving they should consult our references at the end of this chapter.

Actually one of the most useful syntactical features for accomplishing this is the procedure. The procedure name and its parameters file: Of course.. When is recursion an appropriate mechanism for algorithm exposition? One instance is when the problem itself is recursively defined.

For these reasons we introduce recursion here. This view of the procedure implies that it is invoked. What this fails to stress is the fact that procedures may call themselves direct recursion before they are done or they may call other procedures which again invoke the calling procedure indirect recursion..

Factorial fits this category. These recursive mechanisms are extremely powerful. This is unfortunate because any program that can be written using assignment. Most students of computer science view recursion as a somewhat mystical technique which only is useful for some very special class of problems such as computing factorials or Ackermann's function.

Given the input-output specifications of a procedure.. The answer is obtained by printing i a followed by all permutations of b. Given a set of n 1 elements the problem is to print all possible permutations of this set.

Then try to do one or more of the exercises at the end of this chapter which ask for recursive procedures. It is easy to see that given n elements there are n! A is a character string e. A simple algorithm can be achieved by looking at the case of four elements a.

B file: It implies that we can solve the problem for a set with n elements if we had an algorithm which worked on n. This may sound strange. But for now we will content ourselves with examining some simple. This gives us the following set of three procedures.

To rewrite it recursively the first thing we do is to remove the for loops and express the algorithm using assignment. The main purpose is to make one more familiar with the execution of a recursive procedure.

Suppose we start with the sorting algorithm presented in this section. We will see several important examples of such structures.. Another instance when recursion is invaluable is when we want to describe a backtracking procedure. Every place where a ''go to label'' appears. The effect of increasing k by one and restarting the procedure has essentially the same effect as the for loop. Now let us trace the action of these procedures as they sort a set of five integers When a procedure is invoked an implicit branch to its beginning is made.

These two procedures use eleven lines while the original iterative version was expressed in nine lines. Notice how in MAXL2 the fourth parameter k is being changed. Procedure MAXL2 is also directly reculsive. Thus a recursive call of a file: There are other criteria for judging programs which have a more direct relationship to performance.

In section 4. The above criteria are all vitally important when it comes to writing software.. Also in that section are several recursive procedures. The first is the amount of time a single execution will take.

The parameter mechanism of the procedure is a form of assignment.

These have to do with computing time and storage requirements of the algorithms. The product of these numbers will be the total time taken by this statement. Performance evaluation can be loosely divided into 2 major phases: The second statistic is called the frequency count.

Hopefully this more subtle approach will gradually infect your own program writing habits so that you will automatically strive to achieve these goals. Both of these are equally important. Though we will not be discussing how to reach these goals.