Skip to content
#

structured-sparsity

Here are 7 public repositories matching this topic...

Custom CUDA kernels for accelerating 1.58-bit ternary LLM inference with 2:4 structured sparsity on consumer Ampere GPUs. Exploits both ternary arithmetic (no multiplies) and hardware sparse tensor cores to maximize throughput on RTX 3060. Based on the Sparse-BitNet paper (Zhang et al., 2026).

  • Updated Mar 11, 2026
  • Cuda

Improve this page

Add a description, image, and links to the structured-sparsity topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the structured-sparsity topic, visit your repo's landing page and select "manage topics."

Learn more