General Discussion
Related: Editorials & Other Articles, Issue Forums, Alliance Forums, Region ForumsMaths 101: the mean vs the median
Last edited Sun May 31, 2015, 07:05 AM - Edit history (1)
Looking at threads like http://www.democraticunderground.com/10026750894 , it's obvious that a good number of DUers don't understand the difference between a mean and a median (and that a lot of those DUers think that they do, and are attacking the OP on the basis of their error).
The mean average of a group of numbers is the sum of the numbers in the group, divided by the number of things in the group. So, for example, the mean of 2,3,5,9,11 is (2+3+5+9+11)/5 = 6.
The median average of a group of numbers is the one in the middle, when they are ordered. So the median of 2,3,5,9,11 is 5, because there are an equal number of numbers smaller than it and larger than it.
The advantage of the mean is that it captures information about all the numbers. If I add 1 to any of the numbers in my set, the mean will go up by 1/n, where n is the number of things I have.
The advantage of the median is that it ignores outliers, which is often a useful thing when looking at sets of data in the real world. In particular, the median income is *not* - whatever some of the people in that thread think - skewed by the income of the very rich (or the very poor) - all it will measure is the income of the middle member of the middle class.
The mean of the set $10k, $10k, $10k, $20k, $10000k is $2010k - the presence of a single multimillionaire massively distorts the mean. But the median is $10k, which gives a much clearer picture of how the average person is living.
(For added credit, the mode average of a set of numbers is the number that occurs the most times - so $10k in the above example. It's generally not very useful. But understanding the difference and the different uses of the mean and median is vital if you don't want to make a fool of yourself).
cyberswede
(26,117 posts)My 6th grader learned this stuff earlier this year, so I got a nice refresher when I reviewed her homework. Until then, I'd pretty much forgotten what the specific differences between them were.
Hoyt
(54,770 posts)didn't take advantage of the education opportunities available. I am really concerned about our country.
TexasBushwhacker
(21,204 posts)makes almost $12 an hour, it doesn't mean much when their executives are making $10 Million plus. The median worker pay is closer to $9 an hour. Of course, that means half the workers make less, and it doesn't account for the number of hours they can work. A single person might be able to squeak by on $9 an hour if they could work 40 hours every week, but most don't.
Donald Ian Rankin
(13,598 posts)TexasBushwhacker
(21,204 posts)Especially considering that Walmart employees get an estimated $6 BIllion in public assistance every year. That's around $5K per employee down there in modeville. Walmart the welfare queen!
Trekologer
(1,078 posts)If Walmart includes managers and executives then yes, the average can be misleading. If they only include hourly workers, average might be more meaningful.
MoonRiver
(36,975 posts)Hoping your post helps.
Nuclear Unicorn
(19,497 posts)MoonRiver
(36,975 posts)Sigh
Warpy
(114,615 posts)because that's the number that gives the truest picture of all when you're talking about things like average national income because it's what the greatest number of people are living on.
Unfortunately, people who haven't bothered with a statistics course (and a few who have but use statistics to create new lies) usually interchange the words "mean" and "median" to pump up wage statistics. That's about run its course, though, since so few of us have ever managed to live up to those statistics and it's starting to dawn on us that we're not failures because no one else we know has managed, either.
Recursion
(56,582 posts)Last edited Sat May 30, 2015, 11:03 PM - Edit history (1)
It's very uncommon for two different households to make exactly the same income in a given year. You can do a "mode with bins", say, rounding everything to the nearest dollar, but even then there won't be enough grouping to necessarily make mode meaningful.
Moliere
(285 posts)I think your frustration led you to type angry. Third paragraph: you meant median no?
So the mean of 2,3,5,7,9 is 5, because...
Donald Ian Rankin
(13,598 posts)Nye Bevan
(25,406 posts)"Exponentially" does not mean "very rapidly".
lumberjack_jeff
(33,224 posts)The weakness is the difference between income and wages.
It is unarguable that plenty more wealth has been created, but it's gone disproportionately to capital and not labor.
Ikonoklast
(23,973 posts)It will give them and even larger pie they can then not share with anyone else, even though Labor produced that pie in the first place.
Gidney N Cloyd
(19,847 posts)In other words, the outliers drag the mean average toward them.
Think of median like the median between opposing highway lanes-- doesn't matter how many cars are heading east versus west, the median is still smack dab between them.
Gormy Cuss
(30,884 posts)Last edited Sun May 31, 2015, 04:10 PM - Edit history (1)
For example, set one: 1,1,1,1,5,100,200,300,1000. Median is 5.
set two: 1,1,1,1,5,100,220,320,1100. Median is 5.
set three: 1,1,1,1,5,200,1000,5000,9000. Median is still 5.
Now say the above sets represent income distribution in constant dollars for three consecutive decades. Looking at the raw observations it's clear that income inequality is increasing but the median doesn't reflect that
Recursion
(56,582 posts)OTOH while our income distribution is not Gaussian, it is "a big hump in the middle with two smaller tails", so the median remains a decent statistic to use.
Gormy Cuss
(30,884 posts)Not so good if the distribution becomes highly asymmetric or if only one tail is long-- and yes, the U.S. income distribution is different from the hypothetical one in my post.
Donald Ian Rankin
(13,598 posts)Or possibly at 0.1, 0.25, 0.5,0.75, 0.9, which captures more information about the shapes of the tails.
But obviously that needs 5 numbers rather than just 1.
Recursion
(56,582 posts)That is, same math as a median with different parameters.
I mean, ideally we'd individually track the income of each and every household, but quantum computing isn't quite there yet. (But it might be in a decade or so...)
Exilednight
(9,359 posts)1939
(1,683 posts)Most people get a smattering of statistics in school where they are taught using the Normal/Bell Curve. In the Normal Curve, the mean, median, and mode are all the same. In a skewed distribution like the Log Normal, the mode is always less than the median and the median is always less than the mean.
1. Calculate natural logarithm of each data element (use grouped data for very large data sets).
2. Calculate the mean (mu) and standard deviation (sigma) of the logarithms of the data.
3. Calculate the mean, median, and mode of the log normal curve.
Mean = exp (mu + sigma-squared/2)
Median = exp (mu)
Mode = exp (mu - sigma-squared)
I used to work in reliability and maintainability engineering and log normal was the best fit for repair times, parts order times, or any thing that was bounded by zero at the left end but could go to infinity on the right end. Income would appear to fit that distribution.
Donald Ian Rankin
(13,598 posts)It's correct that for a log normal distribution the skew is positive and the median is always less than the mean.
But firstly, skew can be negative we well as positive.
And secondly, while in general you'd expect the median of a distribution with positive skew to have a median lower than its mean, and vice versa for negative skew, you can construct pathological distributions where this is not the case (for example, where one tail is long and the other is heavy).
http://en.wikipedia.org/wiki/Skewness
1939
(1,683 posts)Most of the skewed distributions (of which the log normal is only one) have the long tail the right because in theory, it runs to infinity while to the left, the distribution is bounded by zero or by some other location parameter.
Donald Ian Rankin
(13,598 posts)Humans have an innate preference for positive numbers, because it's hard to visualise "if I have -5 apples, and I give you -2 apples", how many sucking apple-shaped voids do I now have?
1939
(1,683 posts)per capita, household, or tax return income runs from zero to something less than infinity and has a peak at the lower (left end) and approaches zero at the top (right end).
The spread between the mean and the median of the log normal curve gives a pretty good idea of income equality.
When the mean and median are equal, the curve narrows to a point which says that everyone in the population has the same income. The larger the ratio, the more spread between the middle and the rich. As the mean/median ratio approaches infinity, it would mean one person has all the income and everyone else's income approaches zero.
Exilednight
(9,359 posts)impossible to determine the median income.
1939
(1,683 posts)If you gave me all of the tax return data for a given year, i could readily calculate the mean, median, and mode of the gross income or taxable income reflected on that population of tax returns.
Exilednight
(9,359 posts)1. An estimated 2 million returns that should be filed are not filed. You'd start with incomplete data.
2. Some returns are filed jointly, some as individuals. You may have a household with 2 people filing separate returns and another with 2 working adults filing a joint return.
3. What you make is not what you are taxed on. Your pay stub will show a total gross vs a total Federal taxable gross. Only your taxable gross is reported.
4. Jobs that are tip based often go unreported, or are estimated, but still taxable income.
Too many variables and too much missing data to actually produce a usable number.
From your assessment, any discussion of income and income inequality is moot because all of the evidence is anecdotal.
Exilednight
(9,359 posts)It is not worth studying, but take with a grain of salt.
KG
(28,795 posts)let's stop the damn lies about NAFTA's effect on the american economic landscape.
central scrutinizer
(12,654 posts)But why are these people hungry? The average (mean) net worth of the people in the room is over one billion dollars!
muriel_volestrangler
(106,212 posts)" the mean of 2,3,5,7,9 is (2+3+5+7+11)/5 = 6"
Are you using 9 or 11 as the highest number?
The mean of the first set is 5.2; of the second 5.6. And I don't think you want to depend on rounding up answers if you're trying to make a clear example.
Donald Ian Rankin
(13,598 posts)2+3+5+9+11 = 30
30/5 = 6
Thanks for pointing this out. Are there any more?
Lesson: do not post at 1:30am while trying to reset my sleep cycle.
muriel_volestrangler
(106,212 posts)raccoon
(32,390 posts)mean and median with a math professor not too long ago.
joshcryer
(62,536 posts)Recursion
(56,582 posts)Ba dum bum.
Donald Ian Rankin
(13,598 posts)Nuclear Unicorn
(19,497 posts)Trekologer
(1,078 posts)Or even imaginary!
Nuclear Unicorn
(19,497 posts)hootinholler
(26,451 posts)You have to compare both together over time?
If the median income doesn't move much year to year whilst the mean income increases, that shows that the gains in income have gone to the upper quartiles.
I think it would be useful to compare the ratio of change of the median to the ratio of change of the mean. I bet there's some statistical or economics name for those numbers.
Donald Ian Rankin
(13,598 posts)The "first moment" of an income distribution is the mean. It measures how rich people are on average.
The "second moment" is the variance. It measures how spread out the data is. So if everyone has about the same amount of money, the variance is low, while if half the people are rich and half are poor then the mean is the same, but the variance is much higher.
The "third moment" is called the skew. It's a bit harder to describe, but essentially a society with a lot of slightly poor people and a few very rich people, and a society with a few very poor people and a lot of slightly rich people, will have the same mean and variance, but the skew in the first case will be positive and the skew in the second case will be negative.
It's not invariably the case, but in general if the skew is positive then the mean is higher than the median, and if the skew is negative then the mean is lower than the median.
An income distribution is almost certain to exhibit strong positive skew, and to have a mean much higher than the median.
(There are also higher moments, but they get progressively harder to relate to properties of the data that can be described in terms of the real world).
rock
(13,218 posts)"Average" is a rather general term. So it does not mean much in math to say, "Compute the average of this set of numbers." As you pointed out, there are at least three meanings that could be applied.