Graph
A graph is a data structure which stores data in the form of edges and vertices. Unlike arrays, Graphs are non-linear meaning that data is not arranged sequentially. So accessing a specific data in a graph cannot be done in a constant time.
Why do we need graphs?
Graphs can be used to represent a variety of real world problems/structures. Some of which are:
- Connection between friends on a social media platform.
- A computer network where vertices are networking machines and edges are connections between the machines.
- A map of roles in an organization.
- An electrical circuit.
- Water pipeline network.
Types of graphs
- Weighted graph
- Every edge in the graph is associated with a value referred as weight.
- Unweighted graph.
- No values are associated with edges.
- Undirected graph
- Connection between any two vertices are bi directional.
- For an undirected graph G, if there exists a path from v1 to v2, then it is also true that there exists a path from v2 to v1.
- Termimologies:
- Degree - number of edges connected to a vertex
- Directed graph
- Each edge is associated with a direction.
- Indegree - number of edges coming into a vertex
- outdegree - number of edges going out of a vertex
- Each edge is associated with a direction.
Ways of representing graph programmatically
There are many ways to represent a graph data structure where two of the most commonly used being adjacency matrix and adjacency list. Each has its pros and cons.
Adjacency matrix: An NxN matrix where N is the number of vertices. If there exists an edge between two vertices say v1 and v2, then value at i,j will be 1 (where i points to v1 row and j points to v2 column).
#adjacency matrix for the above graph
adjMatrix = [[0,1,0,0,0],
[1,0,1,1,0],
[0,1,0,1,1],
[0,1,1,0,0],
[0,0,1,0,0]]
Adjacency list: One of the cons of using adjacency matrix is that its size is always N^2 regardless of the number of edges. An adjacency list a way of storing list of neighbors for each vertex. It is usually implemented with hashtable based data structures. It usually requires less space than the matrix.
#adjacency list for the above graph
adjList = {
1: [2],
2: [3,4],
3: [2,4,5],
4: [2,3],
5: [3]
}
Traversing a graph
Graphs can be traversed using breadh first search and depth first search.
Breath first search (BFS)
- Algorithm:
- Explore the graph level by level.
- First visit vertices one step away and then two steps away and so on…
- Also keep track of already visited vertices.
- Time complexity: O(n^2) when using adjacency matrix and O(n+m) when using adjacency list where n is the vertices count & m is the number of edges.
from collections import deque
# Assume
# adjList - adjacency list representing a graph
def bfs(i): # BFS starting from vertex i
visited = [0] * n #n is the number of vertices
Q = deque()
#start from i
visited[i] = 1
Q.append(i)
#explore each vertex in the Q
while len(Q) > 0:
j = Q.pop()
for k in adjList.get(j):
if visited[k] == 0:
visited[k] = 1
Q.append(k)
- With BFS we can also do the following:
- Trace back to the source from destination. This can be done by have a separate list which keeps track of parent or previous vertex when exploring a new vertex.
- Trace the level at which a vertex is found from the source vertex. This can be done by having a list that keeps track of number of levels for every vertex that are being explored.
- BFS can be used to find the shortest path if weight of all the edges are same.
- Edges explored by BFS form a tree. Any non-tree edges will generate a cycle.
Depth first search(DFS)
- Algorithm
- Start from a vertex i and visit any of its neighbor say j.
- Suspend the exploring i and start exploring j instead.
- Repeat it until we reach a vertex with no unexplored neighbors.
- Backtrack to the nearest suspended vertex that still has an unexplored neighbor.
- Use stack data structure to store the suspended vertices.
- DFS can be naturally implemented using recursion and hence no external stack datastructure is needed when using recursion.
# Assume
# adjList - adjacency list representing a graph
visited = [0] * n #n is the number of vertices
parent = [-1] * n #to track parent of a vertex
def dfs(i):
visited[i] = 1
for j in adjList.get(i):
if visited[j] == 0:
parent[j] = i
DFS(j)
-
Time complexity: O(n^2) when using adjacency matrix and O(n+m) when using adjacency list where n is the vertices count & m is the number of edges. Similar to BFS.
-
DFS uses:
- Many useful information can be extracted from recording the order in which DFS visited vertices. This is called DFS numbering. Steps to implement DFS numbering is as follows.
- Maintain a counter and increment it each time when entering and leaving.
- When entering and leaving a vertex record the count.
- Like BFS, any non-tree edge will generate a cycle.
- DFS numbering can also be used to find strongly connected components in a directed graph. (Two vertices are strongly connected if there exists a path from v1 to v2 and v2 to v1)
- Many useful information can be extracted from recording the order in which DFS visited vertices. This is called DFS numbering. Steps to implement DFS numbering is as follows.
#DFS numbering
# Assume
# adjList - adjacency list representing a graph
visited = [0] * n #n is the number of vertices
parent = [-1] * n #to track parent of a vertex
count = 0
pre = [0] * n
post = [0] * n
def dfs(i):
visited[i] = 1
pre[i] = count
count += 1
for j in adjList.get(i):
if visited[j] == 0:
parent[j] = i
DFS(j)
post[j] = count
count += 1
- Pre and post can be used to
- Find if a graph has a cycle.
- The cut vertex - removing this vertex would disconnect the graph.
Comments