InstructLLaMa.cpp

Inference of LLaMA model with Instruct finetuning with LoRA fine-tunable adapter layers.

Dev-notes: We are switching away from our C++ implementation of LLaMa to the more recent one llama.cpp by @ggerganov that now offers nearly same performance (and output quality) on Macbook as well as support over Linux and Windows.

Supported platforms: Mac OS, Linux, Windows (via CMake)

License: MIT

If you use LLaMa weights, then it should only be used for non-commercial research purposes.

Description & Usage

Here is a typical run using the adapter weights uploaded by tloen/alpaca-lora-7b under MIT license:

make -j && ./main -m ./models/7B/ggml-model-q4_0.bin --instruction "Write an email to your friend about your plans for the weekend." -t 8 -n 128

make -j && ./main -m ./models/7B/ggml-model-q4_0.bin --instruction "Calculate the area of the a circle given its radius." --input "radius = 3" -t 8 -n 128

These follow the Stanford's Alpaca format for instruction prompt (https://github.com/tatsu-lab/stanford_alpaca#data-release)

Setup

Here are the step for the LLaMA-7B model (same as llama.cpp), defaults to the adapter weights uploaded by tloen/alpaca-lora-7b under MIT license:

# build this repo
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

# obtain the original LLaMA model weights and place them in ./models
ls ./models
65B 30B 13B 7B tokenizer_checklist.chk tokenizer.model

# install Python dependencies
python3 -m pip install torch numpy sentencepiece transformers

# convert the 7B model to ggml FP16 format
python3 convert-pth-to-ggml.py models/7B/ 1

# quantize the model to 4-bits
./quantize.sh 7B

# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128 --instruction <instruction> --input <input_to_instruction>

How this differs from original LLaMa.cpp:

convert-pth-to-ggml.py has been updated to download and handle LoRA weights.
utils.h and utils.cpp have been modified to support input prompts in the style of Alpaca.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
convert-pth-to-ggml.py		convert-pth-to-ggml.py
ggml.c		ggml.c
ggml.h		ggml.h
main.cpp		main.cpp
quantize.cpp		quantize.cpp
quantize.sh		quantize.sh
utils.cpp		utils.cpp
utils.h		utils.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InstructLLaMa.cpp

Description & Usage

Setup

How this differs from original LLaMa.cpp:

About

Releases

Packages

Languages

License

NolanoOrg/InstructLLaMa.cpp

Folders and files

Latest commit

History

Repository files navigation

InstructLLaMa.cpp

Description & Usage

Setup

How this differs from original LLaMa.cpp:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages