
February 25, 2025
Expert-Specialized Fine-Tuning (ESFT): A Guide to Efficient Customization of Large Language Models
Novel method for effectively customizing Large Language Models (LLMs) with a Mixture-of- Experts (MoE) architecture is Expert-Specialized Fine-Tuning (ESFT). Unlike traditional fine-tuning, ESFT enhances efficiency, performance, and flexibility by modifying task-relevant model components, reducing processing and storage demands. This document covers setting up and utilizing ESFT to optimize LLMs.
Quick Start
Installation and Setup
Start ESFT by cloning the official repository:
git clone https://github.com/deepseek-ai/ESFT.git
cd esft
Next, install the required dependencies:
pip install transformers torch safetensors accelerate
Download the necessary adapters by running the following:
bash scripts/download_adapters.sh
ESFT fine-tuning and model evaluation begins after setup.
Key Scripts
1. eval_multigpu.py: Multi-GPU Evaluation
This script evaluates the performance of the model on various datasets.
Usage:
python eval_multigpu.py \
--eval_dataset=translation \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--adapter_dir=all_models/adapters/token/translation \
--output_path=results/completions/token/translation.jsonl \
--openai_api_key=YOUR_OPENAI_API_KEY
2. get_expert_scores.py: Compute Expert Scores
Based on assessment data, this script computes ratings for every expert, therefore guiding the choice of which ones help most with certain tasks.
Usage:
python scripts/expert/get_expert_scores.py \
--eval_dataset=translation \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--output_dir=results/expert_scores/translation \
--n_sample_tokens=131072 \
--world_size=4 \
--gpus_per_rank=2
3. generate_expert_config.py: Generate MoE Expert Configuration
This script uses evaluation scores to instruct just the most appropriate MoE specialists, increasing task specialization.
Usage:
python scripts/expert/generate_expert_config.py \
--eval_datasets=intent,summary,law,translation \
--expert_scores_dir=results/expert_scores \
--output_dir=results/expert_configs \
--score_function=token \
--top_p=0.2
Note: Based on scores, the top_p value determines the most relevant experts that should be chosen.
4. train.py and train_ep.py: Fine-Tuning with Expert Configuration
These scripts optimize the LLM depending on the expert settings. Designed for multi-GPU training with expert parallelism, train_ep.py is
Single-GPU Training:
python train.py \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--expert_config=results/expert_configs/intent.json \
--train_dataset=intent \
--train_config=configs/base.yaml \
--output_dir=results/checkpoints/intent
Multi-GPU Training:
torchrun --nproc-per-node=8 train_ep.py \
--base_model_path=deepseek-ai/ESFT-vanilla-lite \
--expert_config=results/expert_configs/translation.json \
--train_dataset=translation \
--train_config=configs/base.yaml \
--output_dir=results/checkpoints/translation
Conclusion
ESFT simplifies and improves LLM adjustment by focusing on task-relevant expertise. ESFT's MoE design reduces computational cost and enhances model performance. Modular scripts for evaluation, scoring, and fine-tuning make ESFT ideal for customizing LLMs.
104 views