Sum of Squares: Calculation, Types, and Examples

Sum of Squares

Investopedia / Michela Buttignol

What Is the Sum of Squares?

The term sum of squares refers to a statistical technique used in regression analysis to determine the dispersion of data points. The sum of squares can be used to find the function that best fits by varying the least from the data. In a regression analysis, the goal is to determine how well a data series can be fitted to a function that might help to explain how the data series was generated. The sum of squares can be used in the financial world to determine the variance in asset values.

Key Takeaways

  • The sum of squares measures the deviation of data points away from the mean value.
  • A higher sum of squares indicates higher variability while a lower result indicates low variability from the mean.
  • To calculate the sum of squares, subtract the data points from the mean, square the differences, and add them together.
  • There are three types of sum of squares: total, residual, and regressive.
  • Investors can use the sum of squares to help make better decisions about their investments.

Sum of Squares Formula

The following is the formula for the total sum of squares.

For a set  X  of  n  items: Sum of squares = i = 0 n ( X i ? X ) 2 where: X i = The  i t h  item in the set X = The mean of all items in the set ( X i ? X ) = The deviation of each item from the mean \begin{aligned} &\text{For a set } X \text{ of } n \text{ items:}\\ &\text{Sum of squares}=\sum_{i=0}^{n}\left(X_i-\overline{X}\right)^2\\ &\textbf{where:}\\ &X_i=\text{The } i^{th} \text{ item in the set}\\ &\overline{X}=\text{The mean of all items in the set}\\ &\left(X_i-\overline{X}\right) = \text{The deviation of each item from the mean}\\ \end{aligned} ?For a set X of n items:Sum of squares=i=0n?(Xi??X)2where:Xi?=The ith item in the setX=The mean of all items in the set(Xi??X)=The deviation of each item from the mean?

Understanding the Sum of Squares

The sum of squares is a statistical measure of deviation from the mean. It is also known as variation. It is calculated by adding together the squared differences of each data point. To determine the sum of squares, square the distance between each data point and the line of best fit, then add them together. The line of best fit will minimize this value.

A low sum of squares indicates little variation between data sets while a higher one indicates more variation. Variation refers to the difference of each data set from the mean. You can visualize this in a chart. If the line doesn't pass through all the data points, then there is some unexplained variability. We go into a little more detail about this in the next section below.

Analysts and investors can use the sum of squares to make better decisions about their investments. Keep in mind, though that using it means you're making assumptions about using past performance. For instance, this measure can help you determine the level of volatility in a stock's price or how the share prices of two companies compare.

Let's say an analyst who wants to know whether Microsoft (MSFT) share prices move in tandem with those of Apple (AAPL) can list out the daily prices for both stocks for a certain period (say one, two, or 10 years) and create a linear model or a chart. If the relationship between both variables (i.e., the price of AAPL and MSFT) is not a straight line, then there are variations in the data set that must be scrutinized.

Variation is a statistical measure that is calculated or measured by using squared differences.

How to Calculate the Sum of Squares

You can see why the measurement is called the sum of squared deviations, or the sum of squares for short. You can use the following steps to calculate the sum of squares:

  1. Gather all the data points.
  2. Determine the mean/average
  3. Subtract the mean/average from each individual data point.
  4. Square each total from Step 3.
  5. Add up the figures from Step 4.

In statistics, it is the average of a set of numbers, which is calculated by adding the values in the data set together and dividing by the number of values. But knowing the mean may not be enough to determine the sum of squares. As such, it helps to know the variation in a set of measurements. How far individual values are from the mean may provide insight into how fit the observations or values are to the regression model that is created.

Types of Sum of Squares

The formula we highlighted earlier is used to calculate the total sum of squares. The total sum of squares is used to arrive at other types. The following are the other types of sum of squares.

Residual Sum of Squares

As noted above, if the line in the linear model created does not pass through all the measurements of value, then some of the variability that has been observed in the share prices is unexplained. The sum of squares is used to calculate whether a linear relationship exists between two variables, and any unexplained variability is referred to as the residual sum of squares.

The RSS allows you to determine the amount of error left between a regression function and the data set after the model has been run. You can interpret a smaller RSS figure as a regression function that is well-fit to the data while the opposite is true of a larger RSS figure.

Here is the formula for calculating the residual sum of squares:

SSE = i = 1 n ( y i ? y ^ i ) 2 where: y i = Observed value y ^ i = Value estimated by regression line \begin{aligned}&\text{SSE} = \sum_{i = 1}^{n} (y_i - \hat{y}_i)^2 \\&\textbf{where:} \\&y_i = \text{Observed value} \\&\hat{y}_i = \text{Value estimated by regression line} \\\end{aligned} ?SSE=i=1n?(yi??y^?i?)2where:yi?=Observed valuey^?i?=Value estimated by regression line?

Regression Sum of Squares

The regression sum of squares is used to denote the relationship between the modeled data and a regression model. A regression model establishes whether there is a relationship between one or multiple variables. Having a low regression sum of squares indicates a better fit with the data. A higher regression sum of squares, though, means the model and the data aren't a good fit together.

Here is the formula for calculating the regression sum of squares:
SSR = i = 1 n ( y ^ i ? y ˉ ) 2 where: y ^ i = Value estimated by regression line y ˉ = Mean value of a sample \begin{aligned}&\text{SSR} = \sum_{i = 1}^{n} (\hat{y}_i - \bar{y})^2 \\&\textbf{where:} \\&\hat{y}_i = \text{Value estimated by regression line} \\&\bar{y} = \text{Mean value of a sample} \\\end{aligned} ?SSR=i=1n?(y^?i??yˉ?)2where:y^?i?=Value estimated by regression lineyˉ?=Mean value of a sample?

Adding the sum of the deviations alone without squaring will result in a number equal to or close to zero since the negative deviations will almost perfectly offset the positive deviations. To get a more realistic number, the sum of deviations must be squared. The sum of squares will always be a positive number because the square of any number, whether positive or negative, is always positive.

Limitations of Using the Sum of Squares

Making an investment decision on what stock to purchase requires many more observations than the ones listed here. An analyst may have to work with years of data to know with a higher certainty how high or low the variability of an asset is. As more data points are added to the set, the sum of squares becomes larger as the values will be more spread out.

The most widely used measurements of variation are the standard deviation and variance. However, to calculate either of the two metrics, the sum of squares must first be calculated. The variance is the average of the sum of squares (i.e., the sum of squares divided by the number of observations). The standard deviation is the square root of the variance.

There are two methods of regression analysis that use the sum of squares: the linear least squares method and the non-linear least squares method. The least squares method refers to the fact that the regression function minimizes the sum of the squares of the variance from the actual data points. In this way, it is possible to draw a function, which statistically provides the best fit for the data. Note that a regression function can either be linear (a straight line) or non-linear (a curving line).

Example of Sum of Squares

Let's use Microsoft as an example to show how you can arrive at the sum of squares.

Using the steps listed above, we gather the data. So if we're looking at the company's performance over a five-year period, we'll need the closing prices for that time frame:

  • $74.01
  • $74.77
  • $73.94
  • $73.61
  • $73.40

Now let's figure out the average price. The sum of the total prices is $369.73 and the mean or average price is $369.73 ÷?5 = $73.95.

Then, figure out the sum of squares, we find the difference of each price from the average, square the differences, and add them together:

  • SS = ($74.01 - $73.95)2 + ($74.77 - $73.95)2 + ($73.94 - $73.95)2 + ($73.61 - $73.95)2 + ($73.40 - $73.95)2
  • SS = (0.06)2 + (0.82)2 + (-0.01)2 + (-0.34)2 + (-0.55)2
  • SS = 1.0942

In the example above, 1.0942 shows that the variability in the stock price of MSFT over five days is very low and investors looking to invest in stocks characterized by price stability and low volatility may opt for MSFT.

How Do You Define the Sum of Squares?

The sum of squares is a form of regression analysis to determine the variance from data points from the mean. If there is a low sum of squares, it means there's low variation. A higher sum of squares indicates higher variance. This can be used to help make more informed decisions by determining investment volatility or to compare groups of investments with one another.

How Do You Calculate the Sum of Squares?

In order to calculate the sum of squares, gather all your data points. Then determine the mean or average by adding them all together and dividing that figure by the total number of data points. Next, figure out the differences between each data point and the mean. Then square those differences and add them together to give you the sum of squares.

How Does the Sum of Squares Help in Finance?

Investors and analysts can use the sum of squares to make comparisons between different investments or make decisions about how to invest. For instance, you can use the sum of squares to determine stock volatility. A low sum generally indicates low volatility while higher volatility is derived from a higher sum of squares.

The Bottom Line

As an investor, you want to make informed decisions about where to put your money. While you can certainly do so using your gut instinct, there are tools at your disposal that can help you. The sum of squares takes historical data to give you an indication of implied volatility. Use it to see whether a stock is a good fit for you or to determine an investment if you're on the fence between two different assets. Keep in mind, though, that the sum of squares uses past performance as an indicator and doesn't guarantee future performance.