Scaling Laws

In AI, scaling laws help us to understand and predict how changes in variables like the amount of computation and data used can have substantial impacts on a model’s performance.

Review Questions

State whether each of the following statements is true or false. If false, explain why or provide a counterexample.

Power laws are mathematical equations that model how a particular quantity varies as a power of another.
The equation A= r2 is a power law.
The equation y=a(x) +(b) is a power law.
When graphed on a standard (linear scale) plot, a power law relationship will appear as a straight line.
Scaling laws are a particular kind of power law that describe how deep learning models scale.
Scaling laws in deep learning predict loss based on model size and dataset size.
All ML models follow scaling laws.
Intricate, expert-designed AI systems generally perform better than deep learning models with many parameters trained on large amounts of data.

‍

Answer:

True.
True.
False. In this equation, the dependent variable y is logarithmically (not exponentially) related to any of the other variables.
False. While power law relationships appear as straight lines on plots with logarithmic scales, when graphed on standard (linear) plots with constant scale, power law relationships may take many different forms. Consider the equations y=x2 and y=x3. Both are power laws, but when graphed on a standard plot, they will be quadratic and cubic curves, respectively.
True.
True.
False. Not all ML models follow scaling laws. While generative models tend to follow regular scaling laws, discriminative models do not exhibit behavior that suggests clear scaling laws.
False. The bitter lesson suggests that scaling data and computational power tends to be more effective than human-designed approaches.

‍

View Answer

Write an example of a power law equation with one independent variable and draw a rough sketch of what it would look like on a log-log plot. Label the y-intercept and slope. What part of the equation do each of these values represent?

Answer:

Exact equations will vary, but should be of a form y=bxa; graphs should display a straight line with y-intercept and slope labeled. The y-intercept corresponds to the coefficient in the equation and the slope corresponds to the exponent. For example, the power law y=5x3 will have slope 3 and y-intercept (5).

View Answer

How is compute (the computation resources used in training) related to scaling? Describe which factors it influences and how.

Answer:

Compute is a vital part of scaling; scaling is only possible through increasing compute. Training models with more parameters or larger datasets requires more computing power. Accordingly, compute greatly influences both model size and dataset size.

View Answer

Cookies Notice: This website uses cookies to identify pages that are being used most frequently. This helps us analyze data about web page traffic and improve our website. We only use this information for the purpose of statistical analysis and then the data is removed from the system. We do not and will never sell user data. Read more about our cookie policy on our privacy policy. Please contact us if you have any questions.