Memory as the Computational Resource

Study note

Introduction

๋ฉ”๋ชจ๋ฆฌ๋Š” ์ ๊ณ , ๋ชจ๋ธ์€ ๋งŽ๊ณ  ๋ฐ์ดํ„ฐ๋Š” ๋” ๋งŽ๋‹ค. ํ•™์Šต์„ ์œ„ํ•ด์„œ๋Š” ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์—ฌ๋Ÿฌ๋ฒˆ learning process๋ฅผ ์ง„ํ–‰ํ•˜๋ฉฐ, ์ •๋ณด๋ฅผ ์ €์žฅํ•˜๊ธฐ ์œ„ํ•ด์„œ ์ ์ ˆํ•œ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด์•ผ ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ์™€ ๋ชจ๋ธ์ด ๊ธฐํ•˜๊ธ‰์ˆ˜์ ์œผ๋กœ ๋Š˜์–ด๋‚œ ๊ฒƒ๊ณผ ๋Œ€๋น„ํ•˜์—ฌ ๋ฉ”๋ชจ๋ฆฌ ์ˆ˜๋Š” ๊ต‰์žฅํžˆ ์ฒœ์ฒœํžˆ ์ฆ๊ฐ€ํ•˜์˜€๋‹ค.

ํ•™์Šต๊ณผ ์ตœ์ ํ™” ์ž…์žฅ์—์„œ ๋ฉ”๋ชจ๋ฆฌ์˜ ์—ญํ• ์€ ๋ฌด์—‡์ธ๊ฐ€? ๋ฉ”๋ชจ๋ฆฌ์™€ required information ์˜ ์–‘์€ ํŠน์ •ํ•œ ๊ด€๊ณ„๊ฐ€ ์žˆ๋Š”๊ฐ€? (# gradient queries, # data points)

Memory Dichotomy Hypothesis: It is not possible to significantly improve on the convergence rate of known memory efficient techniques without using significantly more memory.


A canonical optimization problem finding $min F(x)$.

Known for $d$ dimensional vector.

์œ„์˜ ๊ฒฝ์šฐ query ์ž์ฒด๋Š” ์—๋Ÿฌ์— ๋Œ€ํ•ด์„œ ์ œ๊ณฑ์œผ๋กœ ๋Š˜์–ด๋‚˜๊ฒŒ ๋œ๋‹ค. ๋”ฐ๋ผ์„œ ์—๋Ÿฌ๋ฅผ ์ถฉ๋ถ„ํžˆ ๋‚ฎ์ถ”๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฌด์ˆ˜ํžˆ ๋งŽ์€ ์ฟผ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

Other techniques such as binary search

๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ query์— ๋Œ€ํ•ด์„œ ๋งŽ์€ ์—ฐ์‚ฐ๊ณผ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜์ง€๋งŒ, ์—๋Ÿฌ์— ๋Œ€ํ•ด์„œ ๋” ์ ์€ ์ฟผ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ์ฟผ๋ฆฌ ์ˆ˜๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•œ ์ ์ ˆํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.

์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ž์ฒด๋Š” ํšจ์œจ์ ์œผ๋กœ ๊ตฌ์„ฑ๋  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ํŠน์ •ํ•œ ๋ชฉ์ ์„ ์ง€๋‹ˆ๊ณ  ์žˆ๋‹ค. ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋” ๋งŽ์ด ์“ฐ๋ฉด์„œ ๋” ํšจ์œจ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๊ตฌ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค.


Orthogonal Vector Game

Random Matrix $A \sim Unif({\pm 1}^{\frac{d}{2} \times d})$ To win, find $y_1, y_2, \cdots, y_k$ which are roughly orthogonal to $A$.

$M$-bit ๋ฉ”์‹œ์ง€๋ฅผ ๋ณด๋‚ด์„œ oracle ์€ row number of largest inner product ๋ฅผ ์ค€๋‹ค. ๋‘ ๋ฒˆ์งธ ์ฟผ๋ฆฌ๋ฅผ ๋ณด๋‚ด๋ฉด์„œ ๊ณ„์† ๊ฐ€์žฅ ์ž‘์€ row ๋ฅผ ๋ฐ˜ํ™˜ํ•ด์ค€๋‹ค. ์ด ์ž‘์—…์„ $m$๋ฒˆ ์ง„ํ–‰ํ•ด์„œ $y_1, y_2, \cdots, y_k$ ๊ฐœ๋ฅผ ๊ตฌํ•ด์•ผ ํ•œ๋‹ค.

Communication Game

Transformer ๋Š” ์ฟผ๋“œ๋ผํ‹ฑ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ๋” ์ ์€ ์ƒ˜ํ”Œ์ด ํ•„์š”ํ•˜๋‹ค. LSTM์€ linear ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ๋” ๋งŽ์€ ์ƒ˜ํ”Œ์ด ํ•„์š”ํ•˜๋‹ค. (์ฟผ๋ฆฌ ์ˆ˜๊ฐ€ ๋งŽ์•„์•ผ ํ•œ๋‹ค.)

Transmission of center and move and scatter them again. Transmitting information directly and safely.