Info
A short paper that we presented at the workshop Sparsity in Neural Networks 2022, containing some perliminary ideas that we are currently expanding into a larger work.
Disclaimer: After this workshop, we have discovered that our proposed method of condensed sparse matrix representation was in fact described in NVIDIA’s paper from 2021 which introduced the Ampere GPU architecture, so we do not claim novelty.
Abstract
Unstructured sparsity achieves promising results in the reduction of theoretical compute for neural network inference and training, but its application in practice has been limited due to the difficulty of accelerating sparse inference on existing hardware. Recent approaches can achieve real-world computational speedups making unstructured sparsity relevant to neural network inference, but fail to maximally exploit the sparsity in trained neural network layers as they impose dense structure constraints after training, still performing computation on some zeroed weights. We propose a novel method to implement matrix-vector multiplication that is particularly efficient for sparse weight matrices trained with a constant number of incoming connections per neuron. We show both empirically and in theory that the constant fan-in constraint does not significantly impair the training and generalization performance of sparse neural networks trained with a dynamic sparse training method such as RigL.
Poster
Here is the poster we presented at SNN 2022: