pretraining and finetuning GPT models with math symbols
GPT is known to be effective for causal language modeling. In this work, we present training results of GPT on mathematical symbols. There are two types of datasets and two configurations respecrtively.
We trained GPT models of three sizes.
version | num layers | num heads | $d_{model}$ |
---|---|---|---|
2 | 1 | 8 | 128 |
4 | 2 | 8 | 128 |
6 | 8 | 8 | 512 |
Version | Digits (0~N) | Modulo (2~N) |
---|---|---|
1 | 10 | 5 |
3 | 20 | 5 |