Sorting

By Notes Vandar

Sorting

Sorting is a fundamental operation in computer science, involving arranging data in a specific order, typically ascending or descending. There are various sorting algorithms, each with its own methodology, efficiency, and application scenarios.

7.1 Introduction and Types of sorting: Internal and External sort

Sorting is a fundamental operation in computer science that involves arranging data in a specific order, typically ascending or descending. The purpose of sorting is to facilitate efficient data retrieval and analysis. In practical applications, sorting can significantly improve the performance of search algorithms, enhance data visualization, and streamline data processing tasks.

Types of Sorting

Sorting algorithms can be broadly classified into two categories based on the nature of the data and how they handle it:

1. Internal Sorting

Definition: Internal sorting algorithms operate on data that fits entirely within the main memory (RAM). Since all data can be accessed quickly, internal sorting methods typically provide better performance in terms of speed and efficiency.

Characteristics:

  • Speed: Generally faster due to direct memory access.
  • Data Size: Suitable for relatively smaller datasets that can fit into the available memory.
  • Complexity: Varies based on the algorithm used (e.g., O(n^2) for bubble sort, O(n log n) for quicksort and mergesort).

Examples:

  • Bubble Sort
  • Selection Sort
  • Insertion Sort
  • Merge Sort
  • Quick Sort
  • Heap Sort

Applications: Internal sorting is commonly used in scenarios where datasets are small enough to fit in memory, such as sorting arrays in programming and simple database operations.

2. External Sorting

Definition: External sorting algorithms are designed to handle large datasets that cannot fit entirely in the main memory. These algorithms efficiently sort data by utilizing external storage (like hard drives or SSDs) while minimizing the number of disk accesses.

Characteristics:

  • Speed: Generally slower than internal sorting due to disk access times, but optimized to reduce the number of accesses.
  • Data Size: Suitable for large datasets that exceed available memory.
  • Complexity: More complex, involving techniques to minimize data transfer between disk and memory.

Examples:

  • External Merge Sort: A variant of the merge sort that divides the data into smaller chunks, sorts them internally, and then merges them back together.
  • Replacement Selection: An algorithm that helps minimize the number of runs during external sorting.

Applications: External sorting is used in database management systems, large-scale data processing frameworks (like Hadoop), and applications where data exceeds available memory, such as sorting log files or large datasets in scientific computing.

 

7.2 Comparison of Sorting Algorithms: Bubble Sort, Selection Sort, and Insertion Sort

Sorting algorithms can be compared based on various criteria, such as time complexity, space complexity, stability, and ease of implementation. Here’s a detailed comparison of Bubble Sort, Selection Sort, and Insertion Sort.

1. Bubble Sort

Description: Bubble sort is a simple comparison-based sorting algorithm. It repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order. The process is repeated until the list is sorted.

Time Complexity:

  • Best Case: O(n) (when the array is already sorted)
  • Average Case: O(n²)
  • Worst Case: O(n²)

Space Complexity: O(1) (in-place)

Stability: Stable (maintains the relative order of equal elements)

Implementation:

void bubbleSort(int arr[], int n) {
for (int i = 0; i < n – 1; i++) {
for (int j = 0; j < n – i – 1; j++) {
if (arr[j] > arr[j + 1]) {
// Swap arr[j] and arr[j + 1]
int temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
}
}

2. Selection Sort

Description: Selection sort divides the input list into two parts: the sorted part and the unsorted part. It repeatedly selects the smallest (or largest) element from the unsorted part and moves it to the end of the sorted part.

Time Complexity:

  • Best Case: O(n²)
  • Average Case: O(n²)
  • Worst Case: O(n²)

Space Complexity: O(1) (in-place)

Stability: Unstable (can change the relative order of equal elements)

Implementation:

void selectionSort(int arr[], int n) {
for (int i = 0; i < n – 1; i++) {
int minIdx = i;
for (int j = i + 1; j < n; j++) {
if (arr[j] < arr[minIdx]) {
minIdx = j;
}
}
// Swap arr[i] and arr[minIdx]
int temp = arr[minIdx];
arr[minIdx] = arr[i];
arr[i] = temp;
}
}

3. Insertion Sort

Description: Insertion sort builds the final sorted array one item at a time. It takes each element from the input data and finds the appropriate position for it in the sorted part of the array.

Time Complexity:

  • Best Case: O(n) (when the array is already sorted)
  • Average Case: O(n²)
  • Worst Case: O(n²)

Space Complexity: O(1) (in-place)

Stability: Stable (maintains the relative order of equal elements)

Implementation:

void insertionSort(int arr[], int n) {
for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i – 1;
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j–;
}
arr[j + 1] = key;
}
}

Comparison Summary

Criterion Bubble Sort Selection Sort Insertion Sort
Time Complexity O(n) (best), O(n²) (avg/worst) O(n²) (all cases) O(n) (best), O(n²) (avg/worst)
Space Complexity O(1) O(1) O(1)
Stability Stable Unstable Stable
Best Use Case Small datasets Small datasets Small or partially sorted datasets

 

7.3 Divide and Conquer Sorting: Merge Sort, Quick Sort, and Heap Sort

Divide and conquer is a powerful algorithmic paradigm used to solve complex problems by breaking them down into smaller subproblems, solving each subproblem independently, and combining the results to solve the original problem. This approach is particularly effective in sorting algorithms, leading to efficient and scalable solutions. Below are three prominent divide and conquer sorting algorithms: Merge Sort, Quick Sort, and Heap Sort.

1. Merge Sort

Description: Merge sort is a stable, comparison-based sorting algorithm that follows the divide and conquer approach. It divides the input array into two halves, recursively sorts each half, and then merges the sorted halves back together.

Time Complexity:

  • Best Case: O(n log n)
  • Average Case: O(n log n)
  • Worst Case: O(n log n)

Space Complexity: O(n) (due to the temporary arrays used for merging)

Stability: Stable

Implementation:

void merge(int arr[], int left, int mid, int right) {
int n1 = mid – left + 1;
int n2 = right – mid;
int L[n1], R[n2];

for (int i = 0; i < n1; i++)
L[i] = arr[left + i];
for (int j = 0; j < n2; j++)
R[j] = arr[mid + 1 + j];

int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k++] = L[i++];
} else {
arr[k++] = R[j++];
}
}
while (i < n1)
arr[k++] = L[i++];
while (j < n2)
arr[k++] = R[j++];
}

void mergeSort(int arr[], int left, int right) {
if (left < right) {
int mid = left + (right – left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}


2. Quick Sort

Description: Quick sort is a highly efficient, comparison-based sorting algorithm that works by selecting a “pivot” element from the array, partitioning the other elements into two sub-arrays (elements less than the pivot and elements greater than the pivot), and recursively sorting the sub-arrays.

Time Complexity:

  • Best Case: O(n log n) (when the pivot is chosen optimally)
  • Average Case: O(n log n)
  • Worst Case: O(n²) (when the smallest or largest element is always chosen as the pivot)

Space Complexity: O(log n) (due to recursive stack space)

Stability: Unstable

Implementation:

int partition(int arr[], int low, int high) {
int pivot = arr[high]; // Choosing the last element as the pivot
int i = low – 1; // Index of the smaller element
for (int j = low; j < high; j++) {
if (arr[j] < pivot) {
i++;
// Swap arr[i] and arr[j]
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
// Swap arr[i + 1] and arr[high] (the pivot)
int temp = arr[i + 1];
arr[i + 1] = arr[high];
arr[high] = temp;
return i + 1;
}

void quickSort(int arr[], int low, int high) {
if (low < high) {
int pi = partition(arr, low, high);
quickSort(arr, low, pi – 1);
quickSort(arr, pi + 1, high);
}
}


3. Heap Sort

Description: Heap sort is a comparison-based sorting algorithm that uses a binary heap data structure. It builds a max heap (or min heap) from the input data and then repeatedly extracts the maximum (or minimum) element from the heap to create a sorted array.

Time Complexity:

  • Best Case: O(n log n)
  • Average Case: O(n log n)
  • Worst Case: O(n log n)

Space Complexity: O(1) (in-place)

Stability: Unstable

Implementation:

void heapify(int arr[], int n, int i) {
int largest = i; // Initialize largest as root
int left = 2 * i + 1; // left = 2*i + 1
int right = 2 * i + 2; // right = 2*i + 2

if (left < n && arr[left] > arr[largest])
largest = left;

if (right < n && arr[right] > arr[largest])
largest = right;

if (largest != i) {
// Swap arr[i] and arr[largest]
int temp = arr[i];
arr[i] = arr[largest];
arr[largest] = temp;

heapify(arr, n, largest);
}
}

void heapSort(int arr[], int n) {
// Build heap (rearrange array)
for (int i = n / 2 – 1; i >= 0; i–)
heapify(arr, n, i);

// One by one extract elements from heap
for (int i = n – 1; i > 0; i–) {
// Move current root to end
int temp = arr[0];
arr[0] = arr[i];
arr[i] = temp;

// Call max heapify on the reduced heap
heapify(arr, i, 0);
}
}


Summary of Divide and Conquer Sorting Algorithms

Algorithm Time Complexity Space Complexity Stability Best Use Cases
Merge Sort O(n log n) (all cases) O(n) Stable Large datasets, linked lists, and external sorting
Quick Sort O(n log n) (avg), O(n²) (worst) O(log n) Unstable Fast sorting, average cases, in-memory sorting
Heap Sort O(n log n) (all cases) O(1) Unstable Memory-constrained environments, when stability is not a concern

 

7.4 Efficiency of Sorting Algorithms

The efficiency of sorting algorithms can be evaluated based on several factors, including time complexity, space complexity, and their behavior in different scenarios (e.g., best case, average case, and worst case). Understanding these factors is crucial for selecting the right sorting algorithm for a given problem. Here’s a detailed overview of how sorting algorithms are evaluated in terms of their efficiency.

1. Time Complexity

Time complexity measures the amount of time an algorithm takes to complete as a function of the size of the input data. It is usually expressed in Big O notation. Here’s a breakdown of the time complexities of various sorting algorithms:

Sorting Algorithm Best Case Average Case Worst Case Time Complexity
Bubble Sort O(n) O(n²) O(n²) O(n²)
Selection Sort O(n²) O(n²) O(n²) O(n²)
Insertion Sort O(n) O(n²) O(n²) O(n²)
Merge Sort O(n log n) O(n log n) O(n log n) O(n log n)
Quick Sort O(n log n) O(n log n) O(n²) O(n log n) (avg), O(n²) (worst)
Heap Sort O(n log n) O(n log n) O(n log n) O(n log n)
  • Bubble Sort: Simple but inefficient for large datasets due to its O(n²) complexity.
  • Selection Sort: Also O(n²) for all cases, making it inefficient.
  • Insertion Sort: Efficient for small or nearly sorted datasets; O(n) in the best case.
  • Merge Sort: Consistent O(n log n) performance, suitable for large datasets.
  • Quick Sort: Generally fast with O(n log n) average case, but can degrade to O(n²) in the worst case.
  • Heap Sort: Offers O(n log n) performance consistently, but is not stable.

2. Space Complexity

Space complexity measures the amount of memory space required by an algorithm as a function of the input size. Here’s how various sorting algorithms compare:

Sorting Algorithm Space Complexity
Bubble Sort O(1)
Selection Sort O(1)
Insertion Sort O(1)
Merge Sort O(n)
Quick Sort O(log n)
Heap Sort O(1)
  • Bubble, Selection, and Insertion Sorts: All are in-place algorithms, requiring minimal additional space.
  • Merge Sort: Requires additional space proportional to the size of the input array, making it less space-efficient.
  • Quick Sort: Uses O(log n) space for recursion, making it more space-efficient than Merge Sort.
  • Heap Sort: Also in-place, making it a memory-efficient option.

3. Stability

Stability refers to whether the algorithm maintains the relative order of equal elements. Stable sorting algorithms are crucial when the order of equal elements is significant. Here’s how various algorithms rank in terms of stability:

Sorting Algorithm Stable
Bubble Sort Yes
Selection Sort No
Insertion Sort Yes
Merge Sort Yes
Quick Sort No
Heap Sort No
  • Stable Algorithms: Bubble Sort, Insertion Sort, and Merge Sort are stable and retain the relative order of equal elements.
  • Unstable Algorithms: Selection Sort, Quick Sort, and Heap Sort do not guarantee stability.

4. Performance on Different Types of Data

Different sorting algorithms perform differently depending on the nature of the input data:

  • Small Datasets: Insertion Sort can be very efficient for small or partially sorted datasets due to its low overhead.
  • Large Datasets: Merge Sort and Quick Sort are generally preferred due to their O(n log n) average case performance.
  • Nearly Sorted Data: Insertion Sort is efficient in this scenario, often running in O(n) time.
  • Data with Many Duplicates: Quick Sort may perform poorly if many duplicates are present unless a good pivot strategy is used. Merge Sort handles duplicates more gracefully.

 

Write a program to implement:
a) Bubble sort b) Selection sort c) Insertion sort
d) Quick sort e) Merge sort f) Heap sort

Here’s a complete program in C that implements Bubble Sort, Selection Sort, Insertion Sort, Quick Sort, Merge Sort, and Heap Sort. Each sorting algorithm is implemented as a separate function, and the main function tests them with sample data.

Complete C Program

#include <stdio.h>

// Function to implement Bubble Sort
void bubbleSort(int arr[], int n) {
for (int i = 0; i < n – 1; i++) {
for (int j = 0; j < n – i – 1; j++) {
if (arr[j] > arr[j + 1]) {
// Swap arr[j] and arr[j + 1]
int temp = arr[j];
arr[j] = arr[j + 1];
arr[j + 1] = temp;
}
}
}
}

// Function to implement Selection Sort
void selectionSort(int arr[], int n) {
for (int i = 0; i < n – 1; i++) {
int minIdx = i;
for (int j = i + 1; j < n; j++) {
if (arr[j] < arr[minIdx]) {
minIdx = j;
}
}
// Swap arr[i] and arr[minIdx]
int temp = arr[minIdx];
arr[minIdx] = arr[i];
arr[i] = temp;
}
}

// Function to implement Insertion Sort
void insertionSort(int arr[], int n) {
for (int i = 1; i < n; i++) {
int key = arr[i];
int j = i – 1;
while (j >= 0 && arr[j] > key) {
arr[j + 1] = arr[j];
j–;
}
arr[j + 1] = key;
}
}

// Function to implement Quick Sort
int partition(int arr[], int low, int high) {
int pivot = arr[high]; // Choosing the last element as the pivot
int i = low – 1; // Index of the smaller element
for (int j = low; j < high; j++) {
if (arr[j] < pivot) {
i++;
// Swap arr[i] and arr[j]
int temp = arr[i];
arr[i] = arr[j];
arr[j] = temp;
}
}
// Swap arr[i + 1] and arr[high] (the pivot)
int temp = arr[i + 1];
arr[i + 1] = arr[high];
arr[high] = temp;
return i + 1;
}

void quickSort(int arr[], int low, int high) {
if (low < high) {
int pi = partition(arr, low, high);
quickSort(arr, low, pi – 1);
quickSort(arr, pi + 1, high);
}
}

// Function to implement Merge Sort
void merge(int arr[], int left, int mid, int right) {
int n1 = mid – left + 1;
int n2 = right – mid;
int L[n1], R[n2];

for (int i = 0; i < n1; i++)
L[i] = arr[left + i];
for (int j = 0; j < n2; j++)
R[j] = arr[mid + 1 + j];

int i = 0, j = 0, k = left;
while (i < n1 && j < n2) {
if (L[i] <= R[j]) {
arr[k++] = L[i++];
} else {
arr[k++] = R[j++];
}
}
while (i < n1)
arr[k++] = L[i++];
while (j < n2)
arr[k++] = R[j++];
}

void mergeSort(int arr[], int left, int right) {
if (left < right) {
int mid = left + (right – left) / 2;
mergeSort(arr, left, mid);
mergeSort(arr, mid + 1, right);
merge(arr, left, mid, right);
}
}

// Function to implement Heap Sort
void heapify(int arr[], int n, int i) {
int largest = i; // Initialize largest as root
int left = 2 * i + 1; // left = 2*i + 1
int right = 2 * i + 2; // right = 2*i + 2

if (left < n && arr[left] > arr[largest])
largest = left;

if (right < n && arr[right] > arr[largest])
largest = right;

if (largest != i) {
// Swap arr[i] and arr[largest]
int temp = arr[i];
arr[i] = arr[largest];
arr[largest] = temp;

heapify(arr, n, largest);
}
}

void heapSort(int arr[], int n) {
// Build heap (rearrange array)
for (int i = n / 2 – 1; i >= 0; i–)
heapify(arr, n, i);

// One by one extract elements from heap
for (int i = n – 1; i > 0; i–) {
// Move current root to end
int temp = arr[0];
arr[0] = arr[i];
arr[i] = temp;

// Call max heapify on the reduced heap
heapify(arr, i, 0);
}
}

// Function to print an array
void printArray(int arr[], int size) {
for (int i = 0; i < size; i++)
printf(“%d “, arr[i]);
printf(“\n”);
}

// Main function to test the sorting algorithms
int main() {
int arr1[] = {64, 34, 25, 12, 22, 11, 90};
int n1 = sizeof(arr1) / sizeof(arr1[0]);

printf(“Original array (Bubble Sort): \n”);
printArray(arr1, n1);
bubbleSort(arr1, n1);
printf(“Sorted array (Bubble Sort): \n”);
printArray(arr1, n1);

int arr2[] = {64, 34, 25, 12, 22, 11, 90};
int n2 = sizeof(arr2) / sizeof(arr2[0]);

printf(“\nOriginal array (Selection Sort): \n”);
printArray(arr2, n2);
selectionSort(arr2, n2);
printf(“Sorted array (Selection Sort): \n”);
printArray(arr2, n2);

int arr3[] = {64, 34, 25, 12, 22, 11, 90};
int n3 = sizeof(arr3) / sizeof(arr3[0]);

printf(“\nOriginal array (Insertion Sort): \n”);
printArray(arr3, n3);
insertionSort(arr3, n3);
printf(“Sorted array (Insertion Sort): \n”);
printArray(arr3, n3);

int arr4[] = {64, 34, 25, 12, 22, 11, 90};
int n4 = sizeof(arr4) / sizeof(arr4[0]);

printf(“\nOriginal array (Quick Sort): \n”);
printArray(arr4, n4);
quickSort(arr4, 0, n4 – 1);
printf(“Sorted array (Quick Sort): \n”);
printArray(arr4, n4);

int arr5[] = {64, 34, 25, 12, 22, 11, 90};
int n5 = sizeof(arr5) / sizeof(arr5[0]);

printf(“\nOriginal array (Merge Sort): \n”);
printArray(arr5, n5);
mergeSort(arr5, 0, n5 – 1);
printf(“Sorted array (Merge Sort): \n”);
printArray(arr5, n5);

int arr6[] = {64, 34, 25, 12, 22, 11, 90};
int n6 = sizeof(arr6) / sizeof(arr6[0]);

printf(“\nOriginal array (Heap Sort): \n”);
printArray(arr6, n6);
heapSort(arr6, n6);
printf(“Sorted array (Heap Sort): \n”);
printArray(arr6, n6);

return 0;
}

Explanation of the Code:

  1. Sorting Functions:
    • Bubble Sort: Repeatedly steps through the list, compares adjacent elements, and swaps them if they are in the wrong order.
    • Selection Sort: Divides the input list into two parts: a sorted part and an unsorted part. It repeatedly selects the smallest (or largest) element from the unsorted part and moves it to the end of the sorted part.
    • Insertion Sort: Builds a sorted array one element at a time by comparing each new element with the already sorted elements and inserting it in the correct position.
    • Quick Sort: Selects a pivot and partitions the array around the pivot, recursively applying the same logic to the sub-arrays.
    • Merge Sort: Divides the array into halves, recursively sorts each half, and then merges the sorted halves.
    • Heap Sort: Builds a max heap from the array and then extracts elements from the heap to produce a sorted array.
  2. Main Function:
    • Initializes a sample array.
    • Calls each sorting function and prints the original and sorted arrays for comparison.

How to Compile and Run

You can compile and run this program using any C compiler. Here’s a command for GCC:

gcc -o sorting_program sorting_program.c
./sorting_program

This will display the original and sorted arrays for each sorting algorithm.

Important Questions
Comments
Discussion
0 Comments
  Loading . . .