recipes : Statistics : Calculating the standard error of the mean

Problem

How do I calculate the standard error of the mean in MATLAB??

Solution

MATLAB's std function will return the standard deviation of a distribution, so we just have to divide this by the square root of the sample size to get the standard error of the mean (SEM).

data=randn(1,30);
sem=std(data)/sqrt(length(data)) % standard error of the mean
sem =
    0.1813

The standard deviation describes the spread of a sample distribution. The SEM describes certainty with which we know the mean of the underlying population based upon our sample of it. More specifically, the SEM is the theoretical standard deviation of the sample-mean's estimate of a population mean. To make the SEM more informative we can convert it to a confidence interval. With a confidence interval, we can say that (assuming normality) there is an X% chance that the underlying population mean falls within certain limits. We can calculate the limits for whatever certainty level we like. A 95% confidence interval tells us that there's a 95% chance that the underlying population mean falls within a certain range of values. Calculating that is easy: it's simply a matter of scaling the SEM by the appropriate quantile from the normal distribution. For example, 95% of the data will fall within 1.96 standard deviations of a normal distribution. So the 95% confidence limits are:

data=randn(1,30);
sem=std(data)/sqrt(length(data)); % standard error of the mean
sem = sem * 1.96 % 95% confidence interval
sem = 
    0.3553

If you know the appropriate quantile from the normal distribution then you can calculate any confidence interval you like. You either look it up in a table or, better yet, use MATLAB's norminv command. The SEM_calc.m function does this for you. Note, however, that norminv is part of the Statistics Toolbox.

Finally, MATLAB's stats toolbox also offers other distributions, such as the t-distribution which is the interval the t-test is based on. The tInterval_Calc.m function computes the t-interval for a distribution. Both the t-interval and SEM functions linked to here contain extra error checking code. They ignore NaNs, for example.

Discussion

We've talked about how to calculate the SEM, but what can we do with it? A common reason people calculate the SEM is to create error bars for bar charts. Usually we plot the error bars at one SEM, but this isn't terribly useful. Remember what the SEM is: it's a way of illustrating the certainty with which you can estimate the population mean based upon your sample. A one SEM confidence interval tells you that there is a 64% chance that the true mean falls within those bounds. That's nice, but who cares? Other than one SEM being a commonly used standard, it's often not very useful for anything. More useful is to use a confidence interval that relates to the chosen significance level in your study. For instance, if you're treating p-values smaller than 0.05 as "significant" then use a confidence interval of the mean of 95%. This will have immediate visual appeal. Say, for instance, that you're interested in whether a particular set of values is significantly different from zero. You have plotted them as a bar char with an error bar. If you're using the 95% SEM and it doesn't overlap with zero then this in itself is a statistical test indicating that the mean is significantly different from zero. Now that's a useful confidence interval!

Want to learn more about topics related to the SEM?

 

Want to continue the discussion?
Enter your comments, suggestions, or thoughts below

comments powered by Disqus