Fast Inference from Transformers via Speculative Decoding
-
arxiv.org
Clear