Graphs

Graph

A graph is a data structure which stores data in the form of edges and vertices. Unlike arrays, Graphs are non-linear meaning that data is not arranged sequentially. So accessing a specific data in a graph cannot be done in a constant time.

Why do we need graphs?

Graphs can be used to represent a variety of real world problems/structures. Some of which are:

  1. Connection between friends on a social media platform.
  2. A computer network where vertices are networking machines and edges are connections between the machines.
  3. A map of roles in an organization.
  4. An electrical circuit.
  5. Water pipeline network.

Types of graphs

  1. Weighted graph
    • Every edge in the graph is associated with a value referred as weight.
  2. Unweighted graph.
    • No values are associated with edges.
  3. Undirected graph
    • Connection between any two vertices are bi directional.
    • For an undirected graph G, if there exists a path from v1 to v2, then it is also true that there exists a path from v2 to v1.
    • Termimologies:
      • Degree - number of edges connected to a vertex
  4. Directed graph
    • Each edge is associated with a direction.
      • Indegree - number of edges coming into a vertex
      • outdegree - number of edges going out of a vertex

Ways of representing graph programmatically

There are many ways to represent a graph data structure where two of the most commonly used being adjacency matrix and adjacency list. Each has its pros and cons.

Adjacency matrix: An NxN matrix where N is the number of vertices. If there exists an edge between two vertices say v1 and v2, then value at i,j will be 1 (where i points to v1 row and j points to v2 column).

graph

#adjacency matrix for the above graph
adjMatrix = [[0,1,0,0,0],
             [1,0,1,1,0],
             [0,1,0,1,1],
             [0,1,1,0,0],
             [0,0,1,0,0]]

Adjacency list: One of the cons of using adjacency matrix is that its size is always N^2 regardless of the number of edges. An adjacency list a way of storing list of neighbors for each vertex. It is usually implemented with hashtable based data structures. It usually requires less space than the matrix.

#adjacency list for the above graph

adjList = {
            1: [2],
            2: [3,4],
            3: [2,4,5],
            4: [2,3],
            5: [3]
    }

Traversing a graph

Graphs can be traversed using breadh first search and depth first search.

Breath first search (BFS)

  • Algorithm:
    • Explore the graph level by level.
    • First visit vertices one step away and then two steps away and so on…
    • Also keep track of already visited vertices.
  • Time complexity: O(n^2) when using adjacency matrix and O(n+m) when using adjacency list where n is the vertices count & m is the number of edges.
from collections import deque

# Assume  
# adjList - adjacency list representing a graph

def bfs(i): # BFS starting from vertex i
  visited = [0] * n #n is the number of vertices
  Q = deque()
  
  #start from i
  visited[i] = 1
  Q.append(i)

  #explore each vertex in the Q
  while len(Q) > 0:
    j = Q.pop()
    for k in adjList.get(j):
      if visited[k] == 0:
        visited[k] = 1
        Q.append(k)
  • With BFS we can also do the following:
    • Trace back to the source from destination. This can be done by have a separate list which keeps track of parent or previous vertex when exploring a new vertex.
    • Trace the level at which a vertex is found from the source vertex. This can be done by having a list that keeps track of number of levels for every vertex that are being explored.
    • BFS can be used to find the shortest path if weight of all the edges are same.
    • Edges explored by BFS form a tree. Any non-tree edges will generate a cycle.

Depth first search(DFS)

  • Algorithm
    • Start from a vertex i and visit any of its neighbor say j.
    • Suspend the exploring i and start exploring j instead.
    • Repeat it until we reach a vertex with no unexplored neighbors.
    • Backtrack to the nearest suspended vertex that still has an unexplored neighbor.
    • Use stack data structure to store the suspended vertices.
    • DFS can be naturally implemented using recursion and hence no external stack datastructure is needed when using recursion.
# Assume  
# adjList - adjacency list representing a graph

visited = [0] * n #n is the number of vertices
parent = [-1] * n #to track parent of a vertex
def dfs(i):
  visited[i] = 1
  for j in adjList.get(i):
    if visited[j] == 0:
      parent[j] = i
      DFS(j)
  • Time complexity: O(n^2) when using adjacency matrix and O(n+m) when using adjacency list where n is the vertices count & m is the number of edges. Similar to BFS.

  • DFS uses:

    • Many useful information can be extracted from recording the order in which DFS visited vertices. This is called DFS numbering. Steps to implement DFS numbering is as follows.
      • Maintain a counter and increment it each time when entering and leaving.
      • When entering and leaving a vertex record the count.
    • Like BFS, any non-tree edge will generate a cycle.
    • DFS numbering can also be used to find strongly connected components in a directed graph. (Two vertices are strongly connected if there exists a path from v1 to v2 and v2 to v1)
#DFS numbering
# Assume  
# adjList - adjacency list representing a graph

visited = [0] * n #n is the number of vertices
parent = [-1] * n #to track parent of a vertex
count = 0
pre = [0] * n
post = [0] * n

def dfs(i):
  visited[i] = 1
  pre[i] = count
  count += 1
  for j in adjList.get(i):
    if visited[j] == 0:   
      parent[j] = i
      DFS(j)
      post[j] = count
      count += 1
  • Pre and post can be used to
    1. Find if a graph has a cycle.
    2. The cut vertex - removing this vertex would disconnect the graph.

References

NPTEL DAA playlist - youtube

Comments