Knowledge Distillation of Black-Box Large Language Models
5 hours ago
- #Large Language Models
- #Knowledge Distillation
- #Proxy Model
- Proprietary large language models (LLMs) like GPT-4 are often black-box, limiting knowledge transfer in distillation.
- Proxy-KD is introduced as a method using a proxy model to facilitate knowledge distillation from black-box LLMs to smaller models.
- Experiments show Proxy-KD enhances performance over traditional white-box KD techniques, offering a new avenue for distillation from advanced LLMs.