CS260 Notes

Algorithm Efficiency


1. Introduction

What makes for a GOOD algorithm?

Two or more algorithms that solve the same problem can be very different and still satisfy these two criteria. Therefore, the next step is to determine which algorithm is "best." There are generally two criteria used to determine whether one algorithm is "better" than another.

A third criteria that should be considered is the cost of human time. That is, the time to develop and maintain the program. A clever coding trick that may improve the space and/or time requirements can often be offset by the lost of program readability that results in the increased human cost of maintaining the program. Therefore, this and the previous courses have put a great emphasis on good programming style and conventions.

Due to the fact that today's computers have vast amounts of memory (both internal and secondary), this course will mainly be concerned with time requirements. In most cases, the differences in space requirements of most algorithms is insignificant.


2. Measuring Time Requirements

How do you determine the time requirements of two algorithms?

Therefore, our primary measurement of algorithm efficiency will be a mathematical analysis of the number of comparisons and/or assignments required by the algorithm to complete its task. In most cases, just comparisons will be used because a comparison (i.e. decision) is much more costly (in terms of time) than an assignment or calculation.


3. Growth Rates of Algorithms

Most algorithms we will consider are dependent on the amount of data that they must process. If there is a small amount of data, rarely is there any significant difference between algorithms and therefore you could use whatever algorithm you prefer. Therefore, we will be most concerned with large amounts of data. That is, what happens to the speed of the algorithm as the number of data element grows large?

In most cases, the number of time units can be reduced down to being proportional the the size of the data set. That is, if the algorithm must handle n pieces of data, then the time required will turn out to be proportional to some function of n. Mathematically, this gives:

The number of time units is some constant multiple of a function of the size of the data set.

or

time = k * f(n)

Therefore, the behavior of the function f(n) and n gets large is an indication of the efficiency of an algorithm. In most cases, f(n) will reduce down to one of the following (given in order of speed with the fastest first): 1 < log2n < n < n * log2n < n2 < n3 < 2n.

The following table shows the values of these functions for several values of n:

n
10
100
1,000
10,000
100,000
1,000,000
1
1
1
1
1
1
1
log2n
3
6
9
13
16
19
n
10
100
1,000
10,000
100,000
1,000,000
n * log2n
30
664
9,965
100,000
1,000,000
107
n2
100
10,000
1,000,000
108
1010
1012
n3
1,000
1,000,000
109
1012
1015
1018
2n
1,000
1030
10301
103,010
1030,103
10301,030

4. Big 'O' Notation

Calculation of efficency is generally reduced down to being proportional to one of the above mentioned functions for large values of n. For example, if the efficicicy of an algorithm is calculated to be ...

f(n) = 5n2 + 9n - 15

Then for large values of n, the 9n and -15 become insignificient. Therefore, this function is (for large values of n), approximately equal to ...

f(n) = 5n2 

In such a case, f(n) is seen to be proportional to n2. That is, it is on the order of n2, or ...

O(n2)

5. Worst-Case, Best-Case, and Average-Case

With some algorithms, the speed efficiency is a function of the amount of data but independent of the actual data. For example, finding the sum of the values of an array of integers.

With other algorithms, the speed efficiency is not only affected by the amount of data, but also by the actual data itself. For example, searching an array of objects for an item with a particular value in some key field. What you are looking for may be the first item checked (requiring only one check), or it might not be found at all (requiring the maximum number of checks).

In these cases, it is important to consider:

In these situations, the application of a particular algorithm would be determined by which extreme is most likely for the particular algorithm.


6. Examples: Sequential and Binary Searches of an Array of n Items

Sequential Search - Assume that the data in the array is in no particular order.

Ordered Sequential Search - Requires the precondition that the data in the array is ordered by the key field that will be used for the search. This additional overhead will affect the efficiency of the search, but for now will not be considered (efficiency of sorting algorithms is the topic of the other half of this chapter).

Binary Search - Requires the precondition that the data in the array is ordered by the key field that will be used for the search. This additional overhead will affect the efficiency of the search, but for now will not be considered (efficiency of sorting algorithms is the topic of the other half of this chapter).

Determining the worst-case calculation for the binary search ...

The binary search begins by looking at the middle item in the array (at position n/2). This results in three possibilities:

    1. This is the item you are looking for, in which case you are now done. But this would not be the worst case, therefore we will ignore this possibility.
    2. The item you are looking for is less than (i.e. comes before) the middle item. Therefore, you can throw away the middle item and the second half of the list and search the first half of the list by the same process. This results in searching from among n/2 items.
    3. The item you are looking for is greater than (i.e. comes after) the middle item. Therefore, you can throw away the middle item and the first half of the list and search the second half of the list by the same process. This results in searching from among n/2 items.

So the question reduces down to, "How many times can b and c occur before you run out of data to consider?"

    • The first comparison results in n/2 items left over.
    • The second comparison results in (n/2)/2 = n/(22) items left over.
    • The third comparison results in ((n/2)/2)/2 = n/(23) items left over.
    • The fourth comparison results in n/(24) items left over.
    • etc ........
    • Eventually, the kth comparison results in n/(2k) items left over. If this results in one item left then only one more check needs to be made.

Therefore, approximately k comparisons are needed to search the array. After k comparisons, there would be only 1 item left over. Therefore ...

n/(2k) = 1

2k = n

k = log2n

Since log2n is less than n, we may conclude that a binary search is more efficient that a linear search (in the worst possible case). Is it? Always? Sometimes? Don't forget the overhead of keeping the array in sorted order! Also, what about the implementation of the binary search? Should you use a recursive or iterative implementation? How would this affect the results?