| Related sites for http://atschool.eduweb.co.uk/mbaker/sorts.html |
| FindFile_ActiveX_Component An ActiveX component that facilitates searching a directory or a set of subdirectories for the oldest or most recent file based on its creation or modification date. | | B_&_I_Software Customization of generic sales force and point of sale software packages written for MS Access. Company offers client lists, phone and e-mail support and upgrading of purchased software. | | CHScanner IPv4 and IPv6 enabled scanner. | | JB_International_Training Customised and public training courses for developers and IT professionals in areas such as ASP.NET, C#, VB.NET, Java, J2EE, UML, XML, SQL Server, Unix and Oracle. | | GameProgrammer_org Contains tutorials about game development featuring OpenGL, Direct3D and Win32 SDK programming. | | WetStone_Technologies A developer of enabling Information Security Technologies. Specializing in the following domains: Secure Timestamping, Intrusion Detection, Cyber Forensics, and Data Integrity. | | nStor_Technologies Leading manufacturer for RAID arrays, SAN, fibre channel, SCSI, ultra SCSI and disk storage. | | MegaFix_Inc_ From the largest gigabyte to the smallest megabyte hard drive. Large selection of all major brands. | | STACS_2002 19th International Symposium on Theoretical Aspects of Computer Science. Antibes Juan-les-Pins, France; 14--16 March 2002. | | Pixar_-_Renderman Information on the toolkit, software packages, pricing and compatible products. | | RFC_2210 The Use of RSVP with IETF Integrated Services. J. Wroclawski. September 1997. | | RFC_0934 Proposed Standard for Message Encapsulation. M.T. Rose, E.A. Stefferud. January 1985. | | RFC_1825 Security Architecture for the Internet Protocol. R. Atkinson. August 1995. | | Personal_Response_Autoresponder Simple, inexpensive yet highly functional and easy to use autoresponder for multiple business applications. Easy database update feature, and autoremove. | | Audio_CD_Burner Create audio CDs from MP3 collection or WAV files. Product specifications, screenshots, FAQs, and downloads. | | Email_Address_Encryptor Software prevents spammer software from extracting email address on webpage. | | WinAgents_TFTP_Server Full featured high-performance TFTP server for Windows. Supports all TFTP options and offers IP-based access management. Works as native system service in background mode. | | DNT_Web_Design A design and integration company. | | Twitter All-in-one solution with web browsing capabilities. | | Akamarsoft Develops mobile Java applications and also offers Girtab a Rapid Midlet Development tool. |
|
Discussion of Sorting Algorithms
Discussion of Sorting Algorithms
Copyright © by Mark Baker
1996,1999
Copyright
waiver
Return
to MarkChrisSoft home page
Firstly the operation of the different algorithms is
examined, then their relative performance is discussed.
Introduction
I grew up with the bubble sort, in common, I am sure, with
many colleagues. Having learnt one sorting algorithm, there
seemed little point in learning any others, it was hardly an
exciting area of study. Mundane sorting may be, but it is also
central to many tasks carried out on computer. Prompted by its
inclusion in the AEB 'A' level syllabus, I looked at the process
again in detail. The efficiency with which sorting is carried out
will often have a significant impact on the overall efficiency of
a program. Consequently there has been much research and it is
interesting to see the range of alternative algorithms that have
been developed.
It is not always possible to say that one algorithm is better
than another, as relative performance can vary depending on the
type of data being sorted. In some situations, most of the data
are in the correct order, with only a few items needing to be
sorted. In other situations the data are completely mixed up in a
random order and in others the data will tend to be in reverse
order. Different algorithms will perform differently according to
the data being sorted. Four common algorithms are the exchange or
bubble sort, the selection sort, the insertion sort and the quick
sort.
The selection sort is a good one to
use with students. It is intuitive and very simple to program. It
offers quite good performance, its particular strength being the
small number of exchanges needed. For a given number of data
items, the selection sort always goes through a set number of
comparisons and exchanges, so its performance is predictable.
procedure SelectionSort ( d: DataArrayType; n:
integer) {n is the number of elements}
for k = 1 to n-1 do
begin
small = k
for j = k+1 to n do
if d[ j ] < d[small] then small = j
{Swap elements k and small}
Swap(d, k, small)
end
Exchange (Bubble) Sort
Element
1
2
3
4
5
6
7
8
Data
27
63
1
72
64
58
14
9
1st pass
27
1
63
64
58
14
9
72
2nd pass
1
27
63
58
14
9
64
72
3rd pass
1
27
58
14
9
63
64
72...
The first two data items (27 and 63) are compared and the
smaller one placed on the left hand side. The second and third
items (63 and 1) are then compared and the smaller one placed on
the left and so on. After all the data has been passed through
once, the largest data item (72) will have "bubbled"
through to the end of the list. At the end of the second pass,
the second largest data item (64) will be in the second last
position. For n data items, the process continues for n-1 passes,
or until no exchanges are made in a single pass.
Insertion Sort
Element
1
2
3
4
5
6
7
8
Data
27
63
1
72
64
58
14
9
1st pass
27
63
1
72
64
58
9
14
2nd pass
27
63
1
72
64
9
14
58
3rd pass
27
63
1
72
9
14
58
64...
The insertion sort starts with the last two elements and
creates a correctly sorted sub-list, which in the example
contains 9 and 14. It then looks at the next element (58) and
inserts it into the sub-list in its correct position. It takes
the next element (64) and does the same, continuing until the
sub-list contains all the data.
Selection Sort
Element
1
2
3
4
5
6
7
8
Data
27
63
1
72
64
58
14
9
1st pass
1
63
27
72
64
58
14
9
2nd pass
1
9
27
72
64
58
14
63
3rd pass
1
9
14
72
64
58
27
63...
The selection sort marks the first element (27). It then goes
through the remaining data to find the smallest number (1). It
swaps this with the first element and the smallest element is now
in its correct position. It then marks the second element (63)
and looks through the remaining data for the next smallest number
(9). These two numbers are then swapped. This process continues
until n-1 passes have been made.
Quick Sort
Element
1
2
3
4
5
6
7
8
Data
27
63
1
72
64
58
14
9
1st pass
1
9
63
72
64
58
14
27
2nd pass
1
9
14
27
64
58
72
63
3rd pass
1
9
14
27
58
63
72
64
4th pass
1
9
14
27
58
63
64
72
sorted!
The quick sort takes the last element (9)
and places it such that all the numbers in the left sub-list are
smaller and all the numbers in the right sub-list are bigger. It
then quick sorts the left sub-list ({1})
and then quick sorts the right sub-list ({63,72,64,58,14,27}).
This is a recursive algorithm, since it is defined in terms of
itself. This reduces the complexity of programming it, however it
is the least intuitive of the four algorithms.
Comparing the Algorithms
There are two important factors when measuring the performance
of a sorting algorithm. The algorithms have to compare the
magnitude of different elements and they have to move the
different elements around. So counting the number of comparisons
and the number of exchanges or moves made by an algorithm offer
useful performance measures. When sorting large record
structures, the number of exchanges made may be the principal
performance criterion, since exchanging two records will involve
a lot of work. When sorting a simple array of integers, then the
number of comparisons will be more important.
It has been said that the only thing going for the bubble
(exchange) sort is its catchy name. The logic of the algorithm is
simple to understand and it is fairly easy to program. It can
also be programmed to detect when it has finished sorting. The
selection sort, by comparison, always goes through the same
amount of work regardless of the data and the quick sort performs
particularly badly with ordered data. However, in general the
bubble sort is a very inefficient algorithm.
The insertion sort is a little better and whilst it cannot
detect that it has finished sorting, the logic of the algorithm
means that it comes to a rapid conclusion when dealing with
sorted data.
The selection sort is a good one to use with students. It is
intuitive and very simple to program. It offers quite good
performance, its particular strength being the small number of
exchanges needed. For a given number of data items, the selection
sort always goes through a set number of comparisons and
exchanges, so its performance is predictable.
The first three algorithms all offer O(n2) performance, that
is sorting times increase with the square of the number of
elements being sorted. That means that if you double the number
of elements being sorted, then there will be a four-fold increase
in the time taken. Ten times more elements increases the time
taken by a factor of 100! This is not a problem with small data
sets, but with hundreds or thousands of elements, this becomes
very significant. With most large data sets, the quick sort is a
vastly superior algorithm (although as you might expect, it is
much more complex), as the table below shows.
Random Data Set: Number of comparisons made
Sort/Elements
50
100
200
300
400
500
Selection Sort
1225
4950
19900
44850
79800
124750
Exchange Sort
1410
5335
20300
45650
79866
126585
Insertion Sort
1391
5399
20473
44449
78779
123715
Quick Sort
399
990
1954
3384
5066
6256
It should be pointed out that the methods above all belong to
one family, they are all internal sorting algorithms. This means
that they can only be used when the entire data structure to be
sorted can be held in the computer's main memory. There will be
situations where this is not possible, for example when sorting a
very large transaction file which is stored on, say, magnetic
tape or disc. Then an external sorting algorithm will be needed.
Sorting for Teachers
Sorting for Teachers is a DOS program that allows students to
step through the four algorithms mentioned above. It shows them
how different data sets would be sorted, by highlighting the
appropriate lines of the algorithm and displaying the variable
values, as they progress through it.
The program also allows students to analyse the performance of
the different algorithms under different conditions. They can
load different data sets or create their own. The number of
elements to be sorted can be set and the performance of each
algorithm, in terms of time taken, number of comparisons and
number of exchanges, is recorded in tabular form. A number of
tests can be run and students can graph the results.
Reference: Data Structures with Abstract Data Types
and Pascal by Stubbs and Webre, pub. Brooks/Cole
Return
to MarkChrisSoft home page
Software:
Sorting for Teachers
Author: Mark Baker, e-mail mbaker@rmplc.co.uk
Last revision: 27th June 1999
|
|