Optimizations
Optimizations.
Quantization
Knowledge Distillation
Transfer the knowledge of a larger (teacher) model to a smaller (student) one.
Soft targets: train student using logits of the teacher. During training, the teacher’s logits are used as targets for the student. Synthetic data: generate synthetic data from the teacher’s predictions.
Sparsity
Torch compiler
References
GPU MODE IRL 2024 Keynotes Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training Knowledge Distillation
This post is licensed under
CC BY 4.0
by the author.