Post

Optimizations

Optimizations.

Quantization

Knowledge Distillation

Transfer the knowledge of a larger (teacher) model to a smaller (student) one.

Soft targets: train student using logits of the teacher. During training, the teacher’s logits are used as targets for the student. Synthetic data: generate synthetic data from the teacher’s predictions.

Sparsity

Torch compiler

References

GPU MODE IRL 2024 Keynotes Quantization explained with PyTorch - Post-Training Quantization, Quantization-Aware Training Knowledge Distillation

This post is licensed under CC BY 4.0 by the author.