SkipV1Former

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads

🧠 Overview

SkipV1Former introduces a simple yet effective architectural modification to Transformer models: it reuses the first-layer Value heads across deeper layers, improving model representation and reducing the cost of Value projections and KV-cache, while preserving model capacity.

This repository provides a reference implementation and reproduction code for the paper:

Improving Model Representation and Reducing KV Cache via Skip Connections with First Value Heads
Zhoutong Wu, Yuan Zhang et al., 2025

Two experiment suites are included:

GPT_experiments/ — DenseFormer-based reproduction on GPT-style models.
LLaMA_experiments/ — GaLore-based reproduction on LLaMA-style models.

🚀 Quick Start (5 minutes)

# 1️⃣ Clone this repo
git clone https://github.com/Zhoutong-Wu/SkipV1Former.git
cd SkipV1Former

# 2️⃣ Install PyTorch by platform (example: CUDA 12.1)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# 3️⃣ GPT-side experiment (DenseFormer-based)
pip install -r GPT_experiments/requirements.txt
python GPT_experiments/main.py --dataset owt2 --skipv1 --iterations 40000 --lr 1e-3

# 4️⃣ LLaMA-side experiment (GaLore-based)
pip install -r LLaMA_experiments/exp_requirements.txt
cd LLaMA_experiments/scripts/benchmark_c4
chmod +x *.sh
. skip_llama_1b.sh

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
GPT_experiments		GPT_experiments
LlaMA_experiments		LlaMA_experiments
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SkipV1Former

🧠 Overview

🚀 Quick Start (5 minutes)

About

Uh oh!

Releases

Packages

Languages

License

Zhoutong-Wu/SkipV1Former

Folders and files

Latest commit

History

Repository files navigation

SkipV1Former

🧠 Overview

🚀 Quick Start (5 minutes)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages