GPT Math Domain Experiment

pretraining and finetuning GPT models with math symbols

Code: (Private)

Introduction

GPT is known to be effective for causal language modeling. In this work, we present training results of GPT on mathematical symbols. There are two types of datasets and two configurations respecrtively.

We trained GPT models of three sizes.

Models

version	num layers	num heads	$d_{model}$
2	1	8	128
4	2	8	128
6	8	8	512

Datasets

Version	Digits (0~N)	Modulo (2~N)
1	10	5
3	20	5

Training Configuration

# training configuration # variants = 1000 / 5000 / 10000 for model sizes respectively. "num_train_epochs": variants, "eval_steps": 1_000, "batch_size":32, "learning_rate":5e-4, "gradient_accumulation_steps":1, "weight_decay":0.1, "num_warmup_steps":1_000,

GPT Math Domain Experiment

Introduction

Models

Datasets

Training Configuration

Math Symbol Prediction

Domain Prediction

Conclusion