Data Engineers Ace Python Coding Interviews NOW!

8 min readApr 7, 2024

If I’m honest, I could have landed a job at a MAANG company by now if I had spent enough time practicing coding assessments for data engineering interviews. I’m frustrated with consistently acing SQL, data warehousing, and ETL questions, only to stumble on Python coding algorithms that seem irrelevant in real-world scenarios. In my 10-year career, I’ve never encountered a data engineering problem where I needed to implement Depth First Search or binary trees. Yet, these skills are regularly tested, not only at top companies but also at medium and small companies. We can complain, or we can take action. So, I’ve compiled the following tips and common algorithm to know:

General tips on answering coding questions

Read and Understand the Problem: Carefully read the problem statement and ensure you understand what the challenge is asking you to do. Identify the input and output requirements, as well as any constraints or special conditions.
Break Down the Problem: Look for ways to break down the problem into smaller, more manageable subproblems if necessary. Identify any repeating patterns or subtasks that can be solved independently and consider how you can combine the solutions to these subproblems to solve the overall problem.
Write Pseudocode: Before writing actual code, outline the steps of your algorithm using pseudocode. This will help you clarify your thoughts and ensure you have a clear plan before starting to code.
Consider Known Algorithms: If the problem resembles a classic algorithmic problem or problem pattern (e.g., sorting, searching, dynamic programming), consider known algorithms that may be applicable. Think about how you can adapt or modify these algorithms to fit the problem’s requirements.
Practice, Practice, Practice: Practice as many problems as you can. Block out 30 minutes daily to solve one LeetCode problem daily. This is the only way to start creating the mental map and seeing patterns in the solution for solving problems. This will aid in identifying the algorithms in play and getting to solutions a lot faster.

This article will focus on expanding step 4 (Considering Known Algorithms). Python algorithms are the backbone of coding interviews, serving as the litmus test for a candidate’s problem-solving skills and algorithmic prowess. We’ll explore the top 10 most commonly tested Python algorithms in coding interviews, providing clear explanations and illustrative examples.

Two Pointer Algorithm

Two pointers is a technique where two pointers are used to traverse a data structure or search for a solution in linear time. It’s commonly used in problems involving arrays or linked lists. Please note that keyword here is Searching Sorted arrays. Hence anytime you have a problem that needs to requires you to search try to check if you could implement this algorithm. It is important to note that you can solve this recursively or using a loop. The data structures involved list, string

Here’s an example of finding the indices of a pair of numbers with a given sum in a sorted array:

def two_sum(nums, target):
    left, right = 0, len(nums) - 1
    
    while left < right:
        total = nums[left] + nums[right]
        if total == target:
            return [left, right]
        elif total < target:
            left += 1
        else:
            right -= 1
            
    return []

# Example usage:
nums = [-2, 1, 2, 4, 7, 11]
target = 13
indices = two_sum(nums, target)
print(indices)  # Output: [2, 5] (indices of numbers 2 and 11)

Another example is finding the index of a number given a target value in a sorted array (binary search):

def binary_search(arr, target):
    low, high = 0, len(arr) - 1
    
    while low <= high:
        mid = (low + high) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid + 1
        else:
            high = mid - 1
            
    return -1

# Example usage:
arr = [1, 2, 3, 4, 5, 6, 7, 8, 9]
target = 5
index = binary_search(arr, target)
print(index)  # Output: 4 (index of target element)

2. Hashing Algorithm

Hashing involves mapping data to a fixed-size hash value, which is used to index or retrieve items from a collection. It’s often used in data storage, retrieval, and comparison operations.

Please note that the key words here are Comparison, Retrieval, Indexing, Counting of unique occurrences.

Please also note that the data structures mostly used to implement hash algorithms are set and dictionary

Here’s an example of using hashing to find duplicate elements in an array:

def find_duplicates(nums):
    seen = set()
    duplicates = set()
    
    for num in nums:
        if num in seen:
            duplicates.add(num)
        else:
            seen.add(num)
    
    return list(duplicates)

# Example usage:
nums = [1, 2, 3, 4, 2, 5, 6, 4]
result = find_duplicates(nums)
print(result)  # Output: [2, 4]

As you can see we had to retrieve, compare, index and count the unique occurrences from the input array.

Another example is the famous Two Sum problem: Given an array of integers nums and an integer target, return the indices of the two numbers that add up to the target.

def twoSum(nums, target):
      numMap = {}
      for i, num in enumerate(nums):
          complement = target - num
          if complement in numMap:
            return [numMap[complement], i]
          numMap[num] = i
      return []

#example
nums = [3,3]
target = 6
result = twoSum(nums,target)
print(result)  # Output: [0,1]

As you can this time we had to do this was the target value minus current indexed value, but still we had to retrieve, compare, and index the unique occurrences from the input array.

3. Recursion Algorithm

Recursion is a programming technique where a function calls itself in order to solve a problem. It involves breaking down a problem into smaller, more manageable subproblems, and then solving each subproblem recursively until a base case is reached.

def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n-1)

# Example usage
print(factorial(5))  # Output: 120

In this example, the factorial function calls itself recursively to calculate the factorial of a number.

Fibonacci Sequence

def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

# Example usage
print(fibonacci(7))  # Output: 13

Application of recursion

Tree traversal algorithms
Graph traversal algorithms

4. Dynamic programming

It is a technique used to solve problems by breaking them down into simpler subproblems. It’s particularly useful for optimization problems with overlapping subproblems. Consider the example of the Fibonacci sequence using dynamic programming:

def fibonacci(n):
    if n <= 1:
        return n
    
    memo = [0] * (n + 1)
    memo[1] = 1
    
    for i in range(2, n + 1):
        memo[i] = memo[i - 1] + memo[i - 2]
        
    return memo[n]

# Example usage:
n = 6
result = fibonacci(n)
print(result)  # Output: 8 (6th Fibonacci number)

5. Graph Algorithms

Graph algorithms like breadth-first search (BFS) and depth-first search (DFS) are essential for traversing and analyzing graph data structures. Let’s see an example of BFS: Breadth-First Search (BFS) is a graph traversal algorithm used to explore and analyze the structure of a graph or tree. It starts at a specific vertex (or node) and explores all its neighbors at the current depth level before moving on to the next level. BFS is particularly useful for finding the shortest path between two nodes in an unweighted graph.

from collections import deque

def bfs(graph, start):
    visited = set()
    queue = deque([start])

    while queue:
        vertex = queue.popleft()
        if vertex not in visited:
            print(vertex)
            visited.add(vertex)
            queue.extend(graph[vertex] - visited)

# Example graph represented as an adjacency list
graph = {
    'A': {'B', 'C'},
    'B': {'A', 'D', 'E'},
    'C': {'A', 'F'},
    'D': {'B'},
    'E': {'B', 'F'},
    'F': {'C', 'E'}
}

# Start BFS from vertex 'A'
bfs(graph, 'A')

6. Greedy Algorithms

Greedy algorithms make locally optimal choices at each step with the hope of finding a global optimum. They’re often used in optimization problems where finding the best solution requires making a series of decisions. Let’s see an example of the coin change problem using a greedy approach:

def coin_change(coins, amount):
    coins.sort(reverse=True)
    count = 0
    
    for coin in coins:
        while amount >= coin:
            amount -= coin
            count += 1
            
    return count if amount == 0 else -1

# Example usage:
coins = [1, 2, 5]
amount = 11
min_coins = coin_change(coins, amount)
print(min_coins)  # Output: 3 (1 * 5 + 3 * 2)

7. Divide and Conquer

Divide and conquer is a problem-solving paradigm where a problem is divided into smaller subproblems, which are then solved recursively. It’s often used in problems involving sorting, searching, and optimization. An example is the merge sort algorithm, as shown below.

def merge_sort(arr):
    if len(arr) <= 1:
        return arr
    
    mid = len(arr) // 2
    left = merge_sort(arr[:mid])
    right = merge_sort(arr[mid:])
    
    return merge(left, right)

def merge(left, right):
    result = []
    i, j = 0, 0
    
    while i < len(left) and j < len(right):
        if left[i] < right[j]:
            result.append(left[i])
            i += 1
        else:
            result.append(right[j])
            j += 1
            
    result.extend(left[i:])
    result.extend(right[j:])
    
    return result

# Example usage:
arr = [3, 1, 4, 1, 5, 9, 2, 6, 5]
sorted_arr = merge_sort(arr)
print(sorted_arr)  # Output: [1, 1, 2, 3, 4, 5, 5, 6, 9]

8. Backtracking

Backtracking is a technique used to find solutions incrementally by exploring all possible options. It’s commonly used in problems involving combinations, permutations, or decision trees. Here’s an example of solving the all possible permutations of an array problem using backtracking

class Solution:
    def permute(self, nums: List[int]) -> List[List[int]]:
        if len(nums) == 1:
            return [nums[:]]
        
        res = []

        for _ in range(len(nums)):
            n = nums.pop(0)
            perms = self.permute(nums)

            for p in perms:
                p.append(n)
            
            res.extend(perms)
            nums.append(n)
        
        return res

9. Bit manipulation

It involves performing operations on individual bits of binary numbers. It’s commonly used in problems involving binary representation, bitwise operations, and optimizations. Here’s an example of setting and clearing bits:

def set_bit(num, pos):
    return num | (1 << pos)

def clear_bit(num, pos):
    return num & ~(1 << pos)

# Example usage:
num = 5  # Binary: 101
num = set_bit(num, 1)   # Set bit at position 1
num = clear_bit(num, 0)  # Clear bit at position 0
print(bin(num))  # Output: 0b110 (6 in decimal)

10. Sorting Algorithms

Sorting algorithms play a crucial role in organizing data efficiently. Python offers various sorting algorithms like bubble sort, insertion sort, merge sort, and quicksort. Let’s consider an example of implementing quick sort in Python:

def quicksort(arr):
    if len(arr) <= 1:
        return arr
    else:
        pivot = arr[len(arr) // 2]  # Choose the pivot element
        left = [x for x in arr if x < pivot]
        middle = [x for x in arr if x == pivot]
        right = [x for x in arr if x > pivot]
        return quicksort(left) + middle + quicksort(right)

# Example usage:
arr = [5, 2, 9, 1, 6]
sorted_arr = quicksort(arr)
print(sorted_arr)  # Output: [1, 2, 5, 6, 9]

Conclusion

I’ll be honest with you again. Writing this article has made me realize how little I know about these algorithms. I’m only familiar with a few, at most four. In the data engineering interviews I’ve had, I’ve found that some advanced algorithms like backtracking and greedy algorithms aren’t often tested. However, I’ve frequently been tested on BFS and DFS, even though I haven’t used them in the real world yet.

It’s important to note that this article isn’t comprehensive or exhaustive of all algorithms that can be tested in data engineering coding interviews. There’s still no substitute for practice, practice, practice. However, I do encourage having some structure and conceptual understanding of the algorithms while practicing problems. It will help you recognize patterns in these problems much faster, eventually making you very adept at answering these coding questions.

Good luck to us all, as I’m on this journey with you.

Hello, I am Nnaemezue Obi-eyisi, a Senior Azure Databricks Data Engineer at Capgemini and the founder of AfroInfoTech, an online coaching platform for Azure data engineers specializing in Databricks. My goal is to help more people break into data engineering career. If interested join my waitlist

Follow me on: LinkedIn | All Platforms

To Learn Azure Data Engineering with Databricks, and join the waitlist: Click here

Data Engineers Ace Python Coding Interviews NOW!

Conclusion

Written by Nnaemezue Obi-Eyisi