Fast Inference from Transformers via Speculative Decoding - arxiv.org

Clear