Skip to content

SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia

Notifications You must be signed in to change notification settings

NolanoOrg/sparse_quant_llms

Repository files navigation

Sparse + Quant LLMs [WIP]

Do not use, unless you really know what you are doing. Broken in multiple places.

Based on Sparse GPT, GPTQ, Optimal Bert Surgeon and others.

run opt.py to get the optimal sparsity for each layer.

python3 opt.py facebook/opt-X [--seed 0] [--nsamples 128] [--wbits 16] [--groupsize -1] [--save PATH_TO_SAVE] [--compression_type {quantizeonly, prunemaskonly, prunemaskreconstruction, prunemagnitudemask, quantizeprune, none}] [--amount_prune 0.5]

Requirements:

torch == 1.13.1 transformers == 4.21.2 sentencepiece == 0.1.97

About

SparseGPT + GPTQ Compression of LLMs like LLaMa, OPT, Pythia

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages