ABSTRACT--

We propose an architecture for graph analysis to find out the single source shortest path to all other vertices is a common problem. The solution to this problem is Bellman- Ford’s algorithm that solves such a single source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. In this we get, the high degree of parallelism is guaranteed at the cost of low work efficiency which is compared to similar algorithms in literature (e.g. Dijkstra’s) involves much more redundant work and a consequent waste of power consumption. This architecture is a parallel implementation of the Bellman-Ford algorithm that explains the architectural characteristics of recent GPU architectures (i.e., NVIDIA Kepler, Maxwell) to improve performance as well as work efficiency. The architecture also presents different optimizations to the implementation, which are oriented both to the algorithm and to the architecture. The experiment will show that the proposed implementation provides an average speedup of 5x higher than the existing most efficient parallel implementations for SSSP, that it works on graphs where those implementations cannot work or are inefficient (e.g., graphs with negative weight edges, sparse graphs), and it reduces the redundant work caused by the parallelization process.

KEYWORDS: Parallel computation, CUDA architecture, GPU computing, shortest path

Number of problems that arise in real world networks requires the computation of the shortest path and its distances from a source to one or more destination points. Its examples are car navigation systems, traffic simulations, spatial databases, Internet route planners, and web searching. Algorithms to solve the shortest path problem are computationally costly, and in many cases commercial products implement heuristic approaches to generate approximate solutions instead. Although heuristics are usually faster and do not need much amount of data storage or pre computation, they do not guarantee the optimal path. The Single-Source Shortest Path (SSSP) problem is a classical problem of optimization. Given a graph G = (V,E), a function w(e) : e ∈ E that associates a weight to the edges of the graph, and a source node s, it consists on computing the shortest paths from s to every node v ∈ V . If the weights of the graph range only in positive values w(e) ≥ 0 : e ∈ E, we are facing the so called Non negative Single-source Shortest Path (NSSP) problem. The classical algorithm that solves the NSSP problem is Dijkstra’s algorithm. Being n = |V | and m = |E|, the complexity time of this algorithm is O (𝑛 2 ). Dijkstra’s algorithm is a greedy algorithm whose efficiency is based in the ordering of previously computed results. This feature makes parallelization a difficult task. However, there are certain situations where parts of this ordering can be permuted without leading to wrong results neither performance losses. Other algorithms for this problem, such as the Bellman-Ford algorithm, are more easily parallelizable. However, with an asymptotical complexity of O(m · n), this algorithm is not as efficient as Dijkstra’s algorithm solving this problem. An emerging way of parallel computation includes the use of hardware accelerators, such as graphic processing units (GPUs). Their powerful capability have triggered their massive use to speed up high level parallel computations.

We propose an architecture for graph analysis to find out the single source shortest path to all other vertices is a common problem. The solution to this problem is Bellman- Ford’s algorithm that solves such a single source shortest path (SSSP) problem and better applies to be parallelized for many-core architectures. In this we get, the high degree of parallelism is guaranteed at the cost of low work efficiency which is compared to similar algorithms in literature (e.g. Dijkstra’s) involves much more redundant work and a consequent waste of power consumption. This architecture is a parallel implementation of the Bellman-Ford algorithm that explains the architectural characteristics of recent GPU architectures (i.e., NVIDIA Kepler, Maxwell) to improve performance as well as work efficiency. The architecture also presents different optimizations to the implementation, which are oriented both to the algorithm and to the architecture. The experiment will show that the proposed implementation provides an average speedup of 5x higher than the existing most efficient parallel implementations for SSSP, that it works on graphs where those implementations cannot work or are inefficient (e.g., graphs with negative weight edges, sparse graphs), and it reduces the redundant work caused by the parallelization process.

KEYWORDS: Parallel computation, CUDA architecture, GPU computing, shortest path

Number of problems that arise in real world networks requires the computation of the shortest path and its distances from a source to one or more destination points. Its examples are car navigation systems, traffic simulations, spatial databases, Internet route planners, and web searching. Algorithms to solve the shortest path problem are computationally costly, and in many cases commercial products implement heuristic approaches to generate approximate solutions instead. Although heuristics are usually faster and do not need much amount of data storage or pre computation, they do not guarantee the optimal path. The Single-Source Shortest Path (SSSP) problem is a classical problem of optimization. Given a graph G = (V,E), a function w(e) : e ∈ E that associates a weight to the edges of the graph, and a source node s, it consists on computing the shortest paths from s to every node v ∈ V . If the weights of the graph range only in positive values w(e) ≥ 0 : e ∈ E, we are facing the so called Non negative Single-source Shortest Path (NSSP) problem. The classical algorithm that solves the NSSP problem is Dijkstra’s algorithm. Being n = |V | and m = |E|, the complexity time of this algorithm is O (𝑛 2 ). Dijkstra’s algorithm is a greedy algorithm whose efficiency is based in the ordering of previously computed results. This feature makes parallelization a difficult task. However, there are certain situations where parts of this ordering can be permuted without leading to wrong results neither performance losses. Other algorithms for this problem, such as the Bellman-Ford algorithm, are more easily parallelizable. However, with an asymptotical complexity of O(m · n), this algorithm is not as efficient as Dijkstra’s algorithm solving this problem. An emerging way of parallel computation includes the use of hardware accelerators, such as graphic processing units (GPUs). Their powerful capability have triggered their massive use to speed up high level parallel computations.

## No comments:

## Post a Comment