Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Donald Ian Rankin

(13,598 posts)
Sat May 30, 2015, 08:05 PM May 2015

Maths 101: the mean vs the median

Last edited Sun May 31, 2015, 07:05 AM - Edit history (1)

Looking at threads like http://www.democraticunderground.com/10026750894 , it's obvious that a good number of DUers don't understand the difference between a mean and a median (and that a lot of those DUers think that they do, and are attacking the OP on the basis of their error).


The mean average of a group of numbers is the sum of the numbers in the group, divided by the number of things in the group. So, for example, the mean of 2,3,5,9,11 is (2+3+5+9+11)/5 = 6.

The median average of a group of numbers is the one in the middle, when they are ordered. So the median of 2,3,5,9,11 is 5, because there are an equal number of numbers smaller than it and larger than it.


The advantage of the mean is that it captures information about all the numbers. If I add 1 to any of the numbers in my set, the mean will go up by 1/n, where n is the number of things I have.

The advantage of the median is that it ignores outliers, which is often a useful thing when looking at sets of data in the real world. In particular, the median income is *not* - whatever some of the people in that thread think - skewed by the income of the very rich (or the very poor) - all it will measure is the income of the middle member of the middle class.

The mean of the set $10k, $10k, $10k, $20k, $10000k is $2010k - the presence of a single multimillionaire massively distorts the mean. But the median is $10k, which gives a much clearer picture of how the average person is living.


(For added credit, the mode average of a set of numbers is the number that occurs the most times - so $10k in the above example. It's generally not very useful. But understanding the difference and the different uses of the mean and median is vital if you don't want to make a fool of yourself).

48 replies = new reply since forum marked as read
Highlight: NoneDon't highlight anything 5 newestHighlight 5 most recent replies
Maths 101: the mean vs the median (Original Post) Donald Ian Rankin May 2015 OP
Good reminder... cyberswede May 2015 #1
Reading the link reminds me of just how sorry our education system is, or how some folks just Hoyt May 2015 #2
So when Walmart brags that the Average (mean) worker TexasBushwhacker May 2015 #3
That might actually be a situation where the mode is worth looking at. N.T. Donald Ian Rankin May 2015 #4
Indeed TexasBushwhacker May 2015 #6
It depends on the data set Trekologer May 2015 #35
A LOT of people seem to not understand these concepts. MoonRiver May 2015 #5
I'd settle to have many stopping using "loose" in place of "lose." Nuclear Unicorn May 2015 #32
Well that's another discussion. MoonRiver May 2015 #34
Thank you for throwing the mode in at the end Warpy May 2015 #7
Mode doesn't really work for income Recursion May 2015 #15
Think you have an exasperation related typo Moliere May 2015 #8
Thank you. Fixed. Donald Ian Rankin May 2015 #21
One of my pet hates is "XXXXX is growing exponentially". Nye Bevan May 2015 #9
The weakness with that thread isn't the difference between mean vs median. lumberjack_jeff May 2015 #10
Which is why corporations and their fluffers loved NAFTA and are slobbering over the TPP. Ikonoklast May 2015 #43
One of the few things that stuck with me from my Ed. Research class is 'the mean follows the tail' Gidney N Cloyd May 2015 #11
Right, and looking at the median alone can obscure changes in the tail. Gormy Cuss May 2015 #16
The variance is a good clue there Recursion May 2015 #19
It's acceptable so long as that curve is maintained. Gormy Cuss May 2015 #20
I like values at quintiles. Donald Ian Rankin May 2015 #25
True. In principle those are just "expanded medians", if you will Recursion May 2015 #29
But it can still skew the the results, depending on what the middle number is. Exilednight May 2015 #12
One of the reasons 1939 May 2015 #13
No, that's not quite right. Donald Ian Rankin May 2015 #22
Most of the skewed distributions 1939 May 2015 #41
For all practical purposes, sure. Donald Ian Rankin May 2015 #42
For all practical purposes 1939 May 2015 #44
One other thing to add, in something as large and complex as the U.S. economy, it's Exilednight May 2015 #14
Depends on what you mean by income. 1939 May 2015 #45
Several problems arise. Exilednight May 2015 #46
So 1939 May 2015 #47
It's befuddled economists for a very long time. It's not that Exilednight May 2015 #48
lies, damn lies, and statistics. KG May 2015 #17
Bill Gates is volunteering one afternoon at the soup kitchen central scrutinizer May 2015 #18
You've still got typos muriel_volestrangler May 2015 #23
Fixed, I think. Donald Ian Rankin May 2015 #24
That was all that I could see (nt) muriel_volestrangler May 2015 #26
Thank you! I saw that thread myself. I've had the discussion about raccoon May 2015 #27
Plenty of GD posts are mean. joshcryer May 2015 #28
It seems to be the mode of operation here Recursion May 2015 #30
Have you no sense of decency, sir? At long last, have you left no sense of decency? Donald Ian Rankin May 2015 #31
So, what you're saying is -- on average, numbers tend to be mean. Nuclear Unicorn May 2015 #33
They could also be irrational Trekologer May 2015 #36
Well, of course they're irrational. You have to count all just to get sum. Nuclear Unicorn May 2015 #39
So to be useful in comparing what is happening economically hootinholler May 2015 #37
The skew isn't quite that, but it's pretty close. Donald Ian Rankin May 2015 #38
And though you did not explicitly say it (but did imply it) rock May 2015 #40

cyberswede

(26,117 posts)
1. Good reminder...
Sat May 30, 2015, 08:09 PM
May 2015

My 6th grader learned this stuff earlier this year, so I got a nice refresher when I reviewed her homework. Until then, I'd pretty much forgotten what the specific differences between them were.

 

Hoyt

(54,770 posts)
2. Reading the link reminds me of just how sorry our education system is, or how some folks just
Sat May 30, 2015, 08:37 PM
May 2015

didn't take advantage of the education opportunities available. I am really concerned about our country.

TexasBushwhacker

(21,204 posts)
3. So when Walmart brags that the Average (mean) worker
Sat May 30, 2015, 08:39 PM
May 2015

makes almost $12 an hour, it doesn't mean much when their executives are making $10 Million plus. The median worker pay is closer to $9 an hour. Of course, that means half the workers make less, and it doesn't account for the number of hours they can work. A single person might be able to squeak by on $9 an hour if they could work 40 hours every week, but most don't.

TexasBushwhacker

(21,204 posts)
6. Indeed
Sat May 30, 2015, 08:51 PM
May 2015

Especially considering that Walmart employees get an estimated $6 BIllion in public assistance every year. That's around $5K per employee down there in modeville. Walmart the welfare queen!

Trekologer

(1,078 posts)
35. It depends on the data set
Sun May 31, 2015, 10:08 AM
May 2015

If Walmart includes managers and executives then yes, the average can be misleading. If they only include hourly workers, average might be more meaningful.

Warpy

(114,615 posts)
7. Thank you for throwing the mode in at the end
Sat May 30, 2015, 08:58 PM
May 2015

because that's the number that gives the truest picture of all when you're talking about things like average national income because it's what the greatest number of people are living on.

Unfortunately, people who haven't bothered with a statistics course (and a few who have but use statistics to create new lies) usually interchange the words "mean" and "median" to pump up wage statistics. That's about run its course, though, since so few of us have ever managed to live up to those statistics and it's starting to dawn on us that we're not failures because no one else we know has managed, either.

Recursion

(56,582 posts)
15. Mode doesn't really work for income
Sat May 30, 2015, 09:57 PM
May 2015

Last edited Sat May 30, 2015, 11:03 PM - Edit history (1)

It's very uncommon for two different households to make exactly the same income in a given year. You can do a "mode with bins", say, rounding everything to the nearest dollar, but even then there won't be enough grouping to necessarily make mode meaningful.

Moliere

(285 posts)
8. Think you have an exasperation related typo
Sat May 30, 2015, 08:59 PM
May 2015

I think your frustration led you to type angry. Third paragraph: you meant median no?

So the mean of 2,3,5,7,9 is 5, because...


Nye Bevan

(25,406 posts)
9. One of my pet hates is "XXXXX is growing exponentially".
Sat May 30, 2015, 09:10 PM
May 2015

"Exponentially" does not mean "very rapidly".

 

lumberjack_jeff

(33,224 posts)
10. The weakness with that thread isn't the difference between mean vs median.
Sat May 30, 2015, 09:11 PM
May 2015

The weakness is the difference between income and wages.

It is unarguable that plenty more wealth has been created, but it's gone disproportionately to capital and not labor.

Ikonoklast

(23,973 posts)
43. Which is why corporations and their fluffers loved NAFTA and are slobbering over the TPP.
Sun May 31, 2015, 12:41 PM
May 2015

It will give them and even larger pie they can then not share with anyone else, even though Labor produced that pie in the first place.

Gidney N Cloyd

(19,847 posts)
11. One of the few things that stuck with me from my Ed. Research class is 'the mean follows the tail'
Sat May 30, 2015, 09:15 PM
May 2015

In other words, the outliers drag the mean average toward them.

Think of median like the median between opposing highway lanes-- doesn't matter how many cars are heading east versus west, the median is still smack dab between them.

Gormy Cuss

(30,884 posts)
16. Right, and looking at the median alone can obscure changes in the tail.
Sat May 30, 2015, 10:07 PM
May 2015

Last edited Sun May 31, 2015, 04:10 PM - Edit history (1)

For example, set one: 1,1,1,1,5,100,200,300,1000. Median is 5.
set two: 1,1,1,1,5,100,220,320,1100. Median is 5.
set three: 1,1,1,1,5,200,1000,5000,9000. Median is still 5.

Now say the above sets represent income distribution in constant dollars for three consecutive decades. Looking at the raw observations it's clear that income inequality is increasing but the median doesn't reflect that


Recursion

(56,582 posts)
19. The variance is a good clue there
Sat May 30, 2015, 11:05 PM
May 2015

OTOH while our income distribution is not Gaussian, it is "a big hump in the middle with two smaller tails", so the median remains a decent statistic to use.

Gormy Cuss

(30,884 posts)
20. It's acceptable so long as that curve is maintained.
Sat May 30, 2015, 11:26 PM
May 2015

Not so good if the distribution becomes highly asymmetric or if only one tail is long-- and yes, the U.S. income distribution is different from the hypothetical one in my post.

Donald Ian Rankin

(13,598 posts)
25. I like values at quintiles.
Sun May 31, 2015, 07:08 AM
May 2015

Or possibly at 0.1, 0.25, 0.5,0.75, 0.9, which captures more information about the shapes of the tails.

But obviously that needs 5 numbers rather than just 1.

Recursion

(56,582 posts)
29. True. In principle those are just "expanded medians", if you will
Sun May 31, 2015, 07:35 AM
May 2015

That is, same math as a median with different parameters.

I mean, ideally we'd individually track the income of each and every household, but quantum computing isn't quite there yet. (But it might be in a decade or so...)

1939

(1,683 posts)
13. One of the reasons
Sat May 30, 2015, 09:25 PM
May 2015

Most people get a smattering of statistics in school where they are taught using the Normal/Bell Curve. In the Normal Curve, the mean, median, and mode are all the same. In a skewed distribution like the Log Normal, the mode is always less than the median and the median is always less than the mean.

1. Calculate natural logarithm of each data element (use grouped data for very large data sets).

2. Calculate the mean (mu) and standard deviation (sigma) of the logarithms of the data.

3. Calculate the mean, median, and mode of the log normal curve.

Mean = exp (mu + sigma-squared/2)

Median = exp (mu)

Mode = exp (mu - sigma-squared)

I used to work in reliability and maintainability engineering and log normal was the best fit for repair times, parts order times, or any thing that was bounded by zero at the left end but could go to infinity on the right end. Income would appear to fit that distribution.

Donald Ian Rankin

(13,598 posts)
22. No, that's not quite right.
Sun May 31, 2015, 06:39 AM
May 2015

It's correct that for a log normal distribution the skew is positive and the median is always less than the mean.

But firstly, skew can be negative we well as positive.

And secondly, while in general you'd expect the median of a distribution with positive skew to have a median lower than its mean, and vice versa for negative skew, you can construct pathological distributions where this is not the case (for example, where one tail is long and the other is heavy).

http://en.wikipedia.org/wiki/Skewness

1939

(1,683 posts)
41. Most of the skewed distributions
Sun May 31, 2015, 12:04 PM
May 2015

Most of the skewed distributions (of which the log normal is only one) have the long tail the right because in theory, it runs to infinity while to the left, the distribution is bounded by zero or by some other location parameter.

Donald Ian Rankin

(13,598 posts)
42. For all practical purposes, sure.
Sun May 31, 2015, 12:18 PM
May 2015

Humans have an innate preference for positive numbers, because it's hard to visualise "if I have -5 apples, and I give you -2 apples", how many sucking apple-shaped voids do I now have?

1939

(1,683 posts)
44. For all practical purposes
Sun May 31, 2015, 02:31 PM
May 2015

per capita, household, or tax return income runs from zero to something less than infinity and has a peak at the lower (left end) and approaches zero at the top (right end).

The spread between the mean and the median of the log normal curve gives a pretty good idea of income equality.

When the mean and median are equal, the curve narrows to a point which says that everyone in the population has the same income. The larger the ratio, the more spread between the middle and the rich. As the mean/median ratio approaches infinity, it would mean one person has all the income and everyone else's income approaches zero.

Exilednight

(9,359 posts)
14. One other thing to add, in something as large and complex as the U.S. economy, it's
Sat May 30, 2015, 09:44 PM
May 2015

impossible to determine the median income.

1939

(1,683 posts)
45. Depends on what you mean by income.
Sun May 31, 2015, 02:33 PM
May 2015

If you gave me all of the tax return data for a given year, i could readily calculate the mean, median, and mode of the gross income or taxable income reflected on that population of tax returns.

Exilednight

(9,359 posts)
46. Several problems arise.
Sun May 31, 2015, 04:00 PM
May 2015

1. An estimated 2 million returns that should be filed are not filed. You'd start with incomplete data.

2. Some returns are filed jointly, some as individuals. You may have a household with 2 people filing separate returns and another with 2 working adults filing a joint return.

3. What you make is not what you are taxed on. Your pay stub will show a total gross vs a total Federal taxable gross. Only your taxable gross is reported.

4. Jobs that are tip based often go unreported, or are estimated, but still taxable income.

Too many variables and too much missing data to actually produce a usable number.

1939

(1,683 posts)
47. So
Sun May 31, 2015, 04:05 PM
May 2015

From your assessment, any discussion of income and income inequality is moot because all of the evidence is anecdotal.

Exilednight

(9,359 posts)
48. It's befuddled economists for a very long time. It's not that
Sun May 31, 2015, 04:13 PM
May 2015

It is not worth studying, but take with a grain of salt.

KG

(28,795 posts)
17. lies, damn lies, and statistics.
Sat May 30, 2015, 10:14 PM
May 2015

let's stop the damn lies about NAFTA's effect on the american economic landscape.

central scrutinizer

(12,654 posts)
18. Bill Gates is volunteering one afternoon at the soup kitchen
Sat May 30, 2015, 10:35 PM
May 2015

But why are these people hungry? The average (mean) net worth of the people in the room is over one billion dollars!

muriel_volestrangler

(106,212 posts)
23. You've still got typos
Sun May 31, 2015, 07:00 AM
May 2015

" the mean of 2,3,5,7,9 is (2+3+5+7+11)/5 = 6"

Are you using 9 or 11 as the highest number?

The mean of the first set is 5.2; of the second 5.6. And I don't think you want to depend on rounding up answers if you're trying to make a clear example.

Donald Ian Rankin

(13,598 posts)
24. Fixed, I think.
Sun May 31, 2015, 07:06 AM
May 2015

2+3+5+9+11 = 30

30/5 = 6

Thanks for pointing this out. Are there any more?

Lesson: do not post at 1:30am while trying to reset my sleep cycle.

raccoon

(32,390 posts)
27. Thank you! I saw that thread myself. I've had the discussion about
Sun May 31, 2015, 07:28 AM
May 2015

mean and median with a math professor not too long ago.

hootinholler

(26,451 posts)
37. So to be useful in comparing what is happening economically
Sun May 31, 2015, 10:40 AM
May 2015

You have to compare both together over time?

If the median income doesn't move much year to year whilst the mean income increases, that shows that the gains in income have gone to the upper quartiles.

I think it would be useful to compare the ratio of change of the median to the ratio of change of the mean. I bet there's some statistical or economics name for those numbers.


Donald Ian Rankin

(13,598 posts)
38. The skew isn't quite that, but it's pretty close.
Sun May 31, 2015, 11:00 AM
May 2015

The "first moment" of an income distribution is the mean. It measures how rich people are on average.

The "second moment" is the variance. It measures how spread out the data is. So if everyone has about the same amount of money, the variance is low, while if half the people are rich and half are poor then the mean is the same, but the variance is much higher.

The "third moment" is called the skew. It's a bit harder to describe, but essentially a society with a lot of slightly poor people and a few very rich people, and a society with a few very poor people and a lot of slightly rich people, will have the same mean and variance, but the skew in the first case will be positive and the skew in the second case will be negative.

It's not invariably the case, but in general if the skew is positive then the mean is higher than the median, and if the skew is negative then the mean is lower than the median.

An income distribution is almost certain to exhibit strong positive skew, and to have a mean much higher than the median.




(There are also higher moments, but they get progressively harder to relate to properties of the data that can be described in terms of the real world).

rock

(13,218 posts)
40. And though you did not explicitly say it (but did imply it)
Sun May 31, 2015, 11:55 AM
May 2015

"Average" is a rather general term. So it does not mean much in math to say, "Compute the average of this set of numbers." As you pointed out, there are at least three meanings that could be applied.

Latest Discussions»General Discussion»Maths 101: the mean vs th...