blog bg

February 25, 2025

Expert-Specialized Fine-Tuning (ESFT): A Guide to Efficient Customization of Large Language Models

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

 

Novel method for effectively customizing Large Language Models (LLMs) with a Mixture-of- Experts (MoE) architecture is Expert-Specialized Fine-Tuning (ESFT). Unlike traditional fine-tuning, ESFT enhances efficiency, performance, and flexibility by modifying task-relevant model components, reducing processing and storage demands. This document covers setting up and utilizing ESFT to optimize LLMs. 

 

Quick Start

Installation and Setup

Start ESFT by cloning the official repository:

git clone https://github.com/deepseek-ai/ESFT.git
cd esft

 

Next, install the required dependencies:

pip install transformers torch safetensors accelerate

 

Download the necessary adapters by running the following:

bash scripts/download_adapters.sh

 

ESFT fine-tuning and model evaluation begins after setup.

 

Key Scripts

 

1. eval_multigpu.py: Multi-GPU Evaluation

This script evaluates the performance of the model on various datasets. 

Usage:

python eval_multigpu.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --adapter_dir=all_models/adapters/token/translation \
    --output_path=results/completions/token/translation.jsonl \
    --openai_api_key=YOUR_OPENAI_API_KEY

 

2. get_expert_scores.py: Compute Expert Scores

Based on assessment data, this script computes ratings for every expert, therefore guiding the choice of which ones help most with certain tasks.

Usage:

python scripts/expert/get_expert_scores.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --output_dir=results/expert_scores/translation \
    --n_sample_tokens=131072 \
    --world_size=4 \
    --gpus_per_rank=2

 

3. generate_expert_config.py: Generate MoE Expert Configuration

This script uses evaluation scores to instruct just the most appropriate MoE specialists, increasing task specialization.

Usage:

python scripts/expert/generate_expert_config.py \
    --eval_datasets=intent,summary,law,translation \
    --expert_scores_dir=results/expert_scores \
    --output_dir=results/expert_configs \
    --score_function=token \
    --top_p=0.2

 

Note: Based on scores, the top_p value determines the most relevant experts that should be chosen. 

 

4. train.py and train_ep.py: Fine-Tuning with Expert Configuration 

These scripts optimize the LLM depending on the expert settings. Designed for multi-GPU training with expert parallelism, train_ep.py is 

 

Single-GPU Training:

python train.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/intent.json \
    --train_dataset=intent \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/intent

 

Multi-GPU Training:

torchrun --nproc-per-node=8 train_ep.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/translation.json \
    --train_dataset=translation \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/translation

 

Conclusion

ESFT simplifies and improves LLM adjustment by focusing on task-relevant expertise. ESFT's MoE design reduces computational cost and enhances model performance. Modular scripts for evaluation, scoring, and fine-tuning make ESFT ideal for customizing LLMs.

104 views

Please Login to create a Question