Papers
arxiv:2111.09645

Dynamic-TinyBERT: Boost TinyBERT's Inference Efficiency by Dynamic Sequence Length

Published on Nov 18, 2021

Abstract

A dynamic TinyBERT model achieves superior accuracy and speed-up across various computational budgets using sequence-length reduction and hyperparameter optimization, trained only once.

Limited computational budgets often prevent transformers from being used in production and from having their high accuracy utilized. TinyBERT addresses the computational efficiency by self-distilling BERT into a smaller transformer representation having fewer layers and smaller internal embedding. However, TinyBERT's performance drops when we reduce the number of layers by 50%, and drops even more abruptly when we reduce the number of layers by 75% for advanced NLP tasks such as span question answering. Additionally, a separate model must be trained for each inference scenario with its distinct computational budget. In this work we present Dynamic-TinyBERT, a TinyBERT model that utilizes sequence-length reduction and Hyperparameter Optimization for enhanced inference efficiency per any computational budget. Dynamic-TinyBERT is trained only once, performing on-par with BERT and achieving an accuracy-speedup trade-off superior to any other efficient approaches (up to 3.3x with <1% loss-drop). Upon publication, the code to reproduce our work will be open-sourced.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2111.09645
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2111.09645 in a dataset README.md to link it from this page.

Spaces citing this paper 15

Browse 15 spaces citing this paper

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.