Searching

By Notes Vandar

Searching

Searching is a fundamental operation in computer science that involves locating a specific element or a group of elements within a data structure. There are various searching algorithms, and they can be broadly categorized into two types: sequential (linear) search and binary search. Each has its own characteristics, advantages, and use cases.

 

8.1 Introduction to searching

Searching is a fundamental concept in computer science and data structures, where the primary goal is to find a specific element or a set of elements within a data structure. Searching is crucial for various applications, including databases, information retrieval systems, and even within basic programming tasks.

Key Concepts

  1. Definition: Searching refers to the process of locating a particular item in a collection of items. This collection could be an array, a list, a database, or any structured dataset.
  2. Importance:
    • Data Retrieval: Searching algorithms allow efficient access to data, enabling systems to retrieve information quickly.
    • Algorithm Efficiency: The performance of searching algorithms can significantly impact the overall efficiency of applications and systems.
    • Application Scope: Searching is a core operation in various domains such as databases, search engines, AI, and more.
  3. Types of Searching:
    • Linear Search: A simple search algorithm that checks each element one by one until the desired element is found or the end of the list is reached. It is easy to implement but inefficient for large datasets.
    • Binary Search: An efficient search algorithm that works on sorted arrays. It divides the search space in half, checking if the target value is less than or greater than the middle element, thus reducing the search space exponentially.
    • Hashing: A method that converts an element into a hash code, allowing for quick access to the element in a hash table. This is commonly used in applications requiring constant time complexity for search operations.
  4. Performance Metrics:
    • Time Complexity: Evaluates the efficiency of a searching algorithm. It is often expressed using Big O notation (e.g., O(n) for linear search and O(log n) for binary search).
    • Space Complexity: Refers to the amount of memory an algorithm requires in relation to the size of the input data.
  5. Searching Applications:
    • Databases: Searching records efficiently.
    • Search Engines: Finding relevant information from vast datasets.
    • Artificial Intelligence: Pathfinding algorithms and decision-making processes.
  6. Choosing a Searching Algorithm:
    • Data Structure: The choice of searching algorithm depends on the data structure in use (e.g., arrays, linked lists, trees).
    • Data Organization: Sorted versus unsorted data determines whether algorithms like binary search can be used.
    • Performance Requirements: Considerations for time and space complexity based on application needs.

8.2 Search Algorithms: Sequential search, Binary search

1. Sequential Search (Linear Search)

Definition: Sequential search, also known as linear search, is the simplest searching algorithm. It checks each element of the list or array one by one until it finds the desired value or reaches the end of the data structure.

Algorithm Steps:

  1. Start from the first element of the list.
  2. Compare the current element with the target value.
  3. If a match is found, return the index of the element.
  4. If not, move to the next element and repeat the process.
  5. If the end of the list is reached without finding the target, return -1 (indicating the target is not present).

Time Complexity:

  • Best Case: O(1) (element found at the first position)
  • Average Case: O(n)
  • Worst Case: O(n)

Use Case: Effective for small or unsorted datasets.

Implementation of Sequential Search in C:

#include <stdio.h>

// Function for Sequential Search
int sequentialSearch(int arr[], int n, int key) {
for (int i = 0; i < n; i++) {
if (arr[i] == key) {
return i; // Return index if found
}
}
return -1; // Return -1 if not found
}

int main() {
int arr[] = {10, 20, 30, 40, 50};
int n = sizeof(arr) / sizeof(arr[0]);
int key = 30;

int result = sequentialSearch(arr, n, key);
if (result != -1) {
printf(“Element %d found at index %d.\n”, key, result);
} else {
printf(“Element %d not found.\n”, key);
}

return 0;
}

2. Binary Search

Definition: Binary search is a more efficient algorithm that works on sorted arrays. It divides the search space in half with each comparison, significantly reducing the number of elements to check.

Algorithm Steps:

  1. Set two pointers: one at the start (left) and one at the end (right) of the array.
  2. While left is less than or equal to right:
    • Calculate the middle index: mid = left + (right - left) / 2.
    • If the element at mid is equal to the target value, return mid.
    • If the target value is less than the element at mid, narrow the search to the left half by setting right = mid - 1.
    • If the target value is greater than the element at mid, narrow the search to the right half by setting left = mid + 1.
  3. If the target is not found, return -1.

Time Complexity:

  • Best Case: O(1) (element found at the middle position)
  • Average Case: O(log n)
  • Worst Case: O(log n)

Use Case: Ideal for large, sorted datasets.

Implementation of Binary Search in C:

#include <stdio.h>

// Function for Binary Search
int binarySearch(int arr[], int left, int right, int key) {
while (left <= right) {
int mid = left + (right – left) / 2; // Prevents overflow

if (arr[mid] == key) {
return mid; // Return index if found
}
if (arr[mid] < key) {
left = mid + 1; // Narrow search to right half
} else {
right = mid – 1; // Narrow search to left half
}
}
return -1; // Return -1 if not found
}

int main() {
int arr[] = {10, 20, 30, 40, 50}; // Sorted array
int n = sizeof(arr) / sizeof(arr[0]);
int key = 40;

int result = binarySearch(arr, 0, n – 1, key);
if (result != -1) {
printf(“Element %d found at index %d.\n”, key, result);
} else {
printf(“Element %d not found.\n”, key);
}

return 0;
}

8.3 Efficiency of Search Algorithms

The efficiency of search algorithms is a crucial consideration when choosing the appropriate algorithm for a specific application. The efficiency can be measured in terms of time complexity and space complexity.

1. Time Complexity

Time complexity refers to the computational time required by an algorithm to complete its task, expressed using Big O notation. Here’s a comparison of the time complexity for various search algorithms:

Search Algorithm Best Case Average Case Worst Case
Linear Search O(1) O(n) O(n)
Binary Search O(1) O(log n) O(log n)
Hashing O(1) O(1) O(n)
  • Linear Search:
    • In the best case, the target is found at the first position, leading to O(1) complexity.
    • In the worst case (target not found), it checks all elements, leading to O(n) complexity.
  • Binary Search:
    • Efficiently reduces the search space in half with each iteration.
    • In the best case, the target is found at the midpoint (O(1)), while in the worst case, the complexity is O(log n) as the search space shrinks exponentially.
  • Hashing:
    • Offers average-case complexity of O(1) for search operations due to direct access through hash codes.
    • However, in the worst case (e.g., many collisions), the complexity can degrade to O(n).

2. Space Complexity

Space complexity refers to the amount of memory an algorithm requires relative to the input size. The space complexity for the aforementioned algorithms is as follows:

Search Algorithm Space Complexity
Linear Search O(1)
Binary Search O(1)
Hashing O(n)
  • Linear Search: Requires a constant amount of space, as it uses only a few variables to keep track of the current index and the target value.
  • Binary Search: Also requires O(1) space if implemented iteratively, since it uses a constant number of variables for pointers and indices. A recursive implementation, however, would require O(log n) space due to the call stack.
  • Hashing: Requires additional space for the hash table that stores key-value pairs, leading to O(n) space complexity depending on the number of elements stored.

3. Other Factors Influencing Efficiency

  • Data Structure: The choice of data structure impacts the efficiency of search algorithms. For example, searching in a linked list is inherently slower than searching in an array due to access times.
  • Data Organization: Algorithms like binary search require sorted data, while linear search can be used on unsorted datasets. The time taken to sort data initially can affect overall performance.
  • Collision Handling in Hashing: Techniques like chaining or open addressing can affect the efficiency of hash-based searches.

8.4 Hashing : Hash function and hash tables, Collision resolution technique

Hashing is a powerful technique used in computer science for fast data retrieval. It involves mapping data to a fixed-size value (hash code) using a hash function, which allows for efficient data access via hash tables.

1. Hash Function

A hash function is an algorithm that transforms input data (keys) into a fixed-size numerical value (hash code). The output of a hash function is used as an index in a hash table to store the corresponding data.

Characteristics of a Good Hash Function:

  • Deterministic: The same input should always produce the same output.
  • Uniform Distribution: It should distribute keys uniformly across the hash table to minimize collisions.
  • Efficient: The computation of the hash code should be fast.
  • Minimize Collisions: While it’s impossible to completely avoid collisions (multiple keys producing the same hash code), a good hash function minimizes their occurrence.

Example of a Simple Hash Function:

int simpleHash(int key) {
return key % SIZE; // SIZE is the size of the hash table
}

2. Hash Table

A hash table is a data structure that implements an associative array, a structure that can map keys to values. It consists of an array of buckets or slots, where each bucket can hold one or more entries.

Structure of a Hash Table:

  • Array: A fixed-size array where the hash codes are used as indices.
  • Linked Lists (or another structure): Used to handle collisions, where multiple keys might hash to the same index.

Operations:

  • Insertion: Calculate the hash code for the key, then store the key-value pair in the appropriate bucket.
  • Search: Calculate the hash code for the key and look for it in the corresponding bucket.
  • Deletion: Calculate the hash code, locate the key, and remove it from the bucket.

3. Collision Resolution Techniques

Collisions occur when two different keys hash to the same index in a hash table. There are several techniques to handle collisions:

3.1 Chaining

In chaining, each slot in the hash table contains a linked list (or another dynamic structure) of all entries that hash to the same index. When a collision occurs, the new entry is added to the linked list at that index.

Advantages:

  • Simple to implement.
  • Allows the hash table to grow dynamically.

Disadvantages:

  • Requires additional memory for the linked lists.
  • Performance can degrade if many collisions occur.
3.2 Open Addressing

In open addressing, when a collision occurs, the algorithm searches for the next available slot within the array. This is done through various probing methods.

  • Linear Probing: If a collision occurs, the algorithm checks the next slot (i.e., index + 1).
  • Quadratic Probing: The algorithm checks slots at intervals of squares (i.e., index + 1, index + 4, index + 9, etc.).
  • Double Hashing: A second hash function determines the step size for probing.

Advantages:

  • More space-efficient as all elements are stored in the array itself.
  • No additional memory for linked lists.

Disadvantages:

  • Clustering can occur (primary clustering with linear probing).
  • Performance degrades significantly as the hash table becomes full.

Write a program to implement:
a) Sequential search
b) Binary search

Here are C programs that implement both sequential search (linear search) and binary search algorithms.

a) Sequential Search (Linear Search)

#include <stdio.h>

// Function for Sequential Search
int sequentialSearch(int arr[], int n, int key) {
for (int i = 0; i < n; i++) {
if (arr[i] == key) {
return i; // Return index if found
}
}
return -1; // Return -1 if not found
}

int main() {
int arr[] = {10, 20, 30, 40, 50}; // Sample array
int n = sizeof(arr) / sizeof(arr[0]); // Number of elements
int key = 30; // Element to search for

int result = sequentialSearch(arr, n, key);
if (result != -1) {
printf(“Element %d found at index %d.\n”, key, result);
} else {
printf(“Element %d not found.\n”, key);
}

return 0;
}

b) Binary Search

#include <stdio.h>

// Function for Binary Search
int binarySearch(int arr[], int left, int right, int key) {
while (left <= right) {
int mid = left + (right – left) / 2; // Calculate mid index

if (arr[mid] == key) {
return mid; // Return index if found
}
if (arr[mid] < key) {
left = mid + 1; // Narrow search to right half
} else {
right = mid – 1; // Narrow search to left half
}
}
return -1; // Return -1 if not found
}

int main() {
int arr[] = {10, 20, 30, 40, 50}; // Sample sorted array
int n = sizeof(arr) / sizeof(arr[0]); // Number of elements
int key = 40; // Element to search for

int result = binarySearch(arr, 0, n – 1, key);
if (result != -1) {
printf(“Element %d found at index %d.\n”, key, result);
} else {
printf(“Element %d not found.\n”, key);
}

return 0;
}

Explanation

  • Sequential Search:
    • The program iterates through each element in the array, checking if it matches the target key.
    • If the element is found, the index is returned; otherwise, -1 is returned.
  • Binary Search:
    • This program assumes that the input array is sorted.
    • It calculates the mid-point of the array, compares it with the target key, and narrows down the search space accordingly.
    • It returns the index if the element is found; otherwise, it returns -1.

How to Compile and Run

To compile and run these programs:

  1. Save each program in a file, e.g., sequential_search.c and binary_search.c.
  2. Open a terminal and navigate to the directory containing the files.
  3. Compile the program using the command:
    gcc -o sequential_search sequential_search.c
    gcc -o binary_search binary_search.c
  4. Run the program using the command:
    ./sequential_search
    ./binary_search

You should see the output indicating whether the elements were found and their respective indices.

Important Questions
Comments
Discussion
0 Comments
  Loading . . .