February 25, 2025

Expert-Specialized Fine-Tuning (ESFT): A Guide to Efficient Customization of Large Language Models

machinelearning

languagemodels

finetuning

llm

python

Only Coders

@onlyCoders

Share what you learn in this blog to prepare for your interview, create your forever-free profile now, and explore how to monetize your valuable knowledge.

Novel method for effectively customizing Large Language Models (LLMs) with a Mixture-of- Experts (MoE) architecture is Expert-Specialized Fine-Tuning (ESFT). Unlike traditional fine-tuning, ESFT enhances efficiency, performance, and flexibility by modifying task-relevant model components, reducing processing and storage demands. This document covers setting up and utilizing ESFT to optimize LLMs.

Quick Start

Installation and Setup

Start ESFT by cloning the official repository:

git clone https://github.com/deepseek-ai/ESFT.git
cd esft

Next, install the required dependencies:

pip install transformers torch safetensors accelerate

Download the necessary adapters by running the following:

bash scripts/download_adapters.sh

ESFT fine-tuning and model evaluation begins after setup.

Key Scripts

1. eval_multigpu.py: Multi-GPU Evaluation

This script evaluates the performance of the model on various datasets.

Usage:

python eval_multigpu.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --adapter_dir=all_models/adapters/token/translation \
    --output_path=results/completions/token/translation.jsonl \
    --openai_api_key=YOUR_OPENAI_API_KEY

2. get_expert_scores.py: Compute Expert Scores

Based on assessment data, this script computes ratings for every expert, therefore guiding the choice of which ones help most with certain tasks.

Usage:

python scripts/expert/get_expert_scores.py \
    --eval_dataset=translation \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --output_dir=results/expert_scores/translation \
    --n_sample_tokens=131072 \
    --world_size=4 \
    --gpus_per_rank=2

3. generate_expert_config.py: Generate MoE Expert Configuration

This script uses evaluation scores to instruct just the most appropriate MoE specialists, increasing task specialization.

Usage:

python scripts/expert/generate_expert_config.py \
    --eval_datasets=intent,summary,law,translation \
    --expert_scores_dir=results/expert_scores \
    --output_dir=results/expert_configs \
    --score_function=token \
    --top_p=0.2

Note: Based on scores, the top_p value determines the most relevant experts that should be chosen.

4. train.py and train_ep.py: Fine-Tuning with Expert Configuration

These scripts optimize the LLM depending on the expert settings. Designed for multi-GPU training with expert parallelism, train_ep.py is

Single-GPU Training:

python train.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/intent.json \
    --train_dataset=intent \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/intent

Multi-GPU Training:

torchrun --nproc-per-node=8 train_ep.py \
    --base_model_path=deepseek-ai/ESFT-vanilla-lite \
    --expert_config=results/expert_configs/translation.json \
    --train_dataset=translation \
    --train_config=configs/base.yaml \
    --output_dir=results/checkpoints/translation

Conclusion

ESFT simplifies and improves LLM adjustment by focusing on task-relevant expertise. ESFT's MoE design reduces computational cost and enhances model performance. Modular scripts for evaluation, scoring, and fine-tuning make ESFT ideal for customizing LLMs.

246 views

Please Login to create a Question

Posts

Questions

Blogs

Jobs