Welcome to DU! The truly grassroots left-of-center political community where regular people, not algorithms, drive the discussions and set the standards. Join the community: Create a free account Support DU (and get rid of ads!): Become a Star Member Latest Breaking News Editorials & Other Articles General Discussion The DU Lounge All Forums Issue Forums Culture Forums Alliance Forums Region Forums Support Forums Help & Search

Ms. Toad

(38,421 posts)
37. You are still missing the difference between size of sample,
Mon Jul 5, 2021, 11:44 PM
Jul 2021

and representation.

You keep focusing on quantity - and lots of tests does not make the tested group representative of the general population.

Let's make it concrete. Because the data isn't available, we'll need to make some reasonable assumptions:

Let's say:

half of the testing is done because people are symptomatic (40%) or were exposed (10%),
30% of the testing is due to travel (I think that's high - but just to put a concrete number to the reason you suggested is the reason people are testing), and
20% are people who have just randomly decided they needed a COVID 19 test for the heeck of it.

Further - of these groups:
those who have symptoms are most likely to be positive (let's say 5.9% - within the range of % positive tests I have been able to find for people who are testing because they are symptomatic),
and let's say roughly half as many positive tests (3%) for those who were exposed, but have no symptoms - they are likely to have a lower positive rate because they are not symptomatic - but higher the general population because they were exposed.
Those testing due to travel have probably been taking precautions - they are likely to be pretty low - let's say .42%, and those randomly testing probably haven't been taking precautions - so they are slightly higher - let's say .65%.

If you calculate it out using the numbers from your first post, that mix gives you the overall 2.5% positivity rate you calculated.

Now - moving to the population as a whole:

most people who are symptomatic are going to get tested. Let's say 80% got tested. That means that there are an additional 20% in the general population who have symptoms, but didn't bother to get tested. They will be positive at roughly the same rate as those exposed and tested.

Probably considerably fewer of those who were merely exposed got tested - so let's say that's 50% got tested - the other 50% didn't bother (but will be positive at roughly the same rate as the tested group).

If you need to be tested for travel, you would have been tested, so no additional predicted positives there

Which leaves everyone else (62,409,072 or so) represented by the 20% of the tested population who just likes to take tests - and who will presumably be positive at about the same rate as the randomly tested population.

That gives us:

211,330 positive tests among those who had symptoms and either got tested (2,865,492) - or should have (716,373) @ a 6% rate

17,193 positive tests among those who were exposed and either got tested (286,549) or should have but didn't (286549) - @ a 3% rate)

361 positive tests among those who were tested for travel - @ .42% positive

405,659 positive tests among those who were randomly tested (17193) or weren't (62,409,072)

That's a total of 634,543 positives - or predicted positives (not 1,657,270) because the representation in the sample did not match the distribution in the population and because the sample was self-selected in a way that generates an artificially high positivity rate.



I have tried to make these numbers as realistic as possible, but I am not asserting they are the actual numbers. The mix I have created has the same positivity as you calculated - BUT - when extended to the population as a whole - generates a number of presumed positive individuals about 40% of the number you suggested. The point of this is to demonstrate a concept you seem unable to grasp.

The positivity rates in the 4 groups are within the ranges I have been able to find. And I'm pretty sure my assumptions about the split in the general population relative to the sample tested are realistic - largely that most of the untested population will be represented by a small portion of those tested who were not being tested for any particular reason - and are thus far more likely to be negative. Obviously there may be more reasons to be tested - each of which would have its own characteristic positivity rate.

But the big point is that the mix matters. If your sample - regardless of how large it is - does not match the mix in the actual population, you cannot, with any statistical validity, calculate an overall percentage for the tested group and extend it to the population as a whole.

Because you are asserting the extension is valid, it is your obligation to demonstrate that (1) positivity rate is the same regardless of the reason for testing or (2) the mix in the tested group mirrors the population as a whole.

As to sample sizes being accurate as to Biden's approval rating - they pay people big bucks to ensure that their small sample is representative, the step you are completely ignoring when you calculate an overall positivity rating and try to extend it to the population as a whole.



Recommendations

0 members have recommended this reply (displayed in chronological order):

Correct me if I'm wrong, but they're not testing random samples of the population, are they? RockRaven Jul 2021 #1
They're doing over 7 million tests a week Yavin4 Jul 2021 #3
The large size of the sample doesn't fix the selection bias problem. RockRaven Jul 2021 #5
I don't have insight into who or how their tests are done. Yavin4 Jul 2021 #8
This message was self-deleted by its author BannonsLiver Jul 2021 #9
The opposite, actually. RockRaven Jul 2021 #13
No, I had it just right. BannonsLiver Jul 2021 #16
Not when the large sample is selected in a non-random manner. The size does not make it RockRaven Jul 2021 #12
I understand sampling bias. Yavin4 Jul 2021 #18
Obviously, you don't. Ms. Toad Jul 2021 #31
The UK has conducted over 200 million tests. Yavin4 Jul 2021 #34
You are still missing the difference between size of sample, Ms. Toad Jul 2021 #37
And that is precisely why you can't extrapolate. Ms. Toad Jul 2021 #20
I have been tracking Russia and the UK BigmanPigman Jul 2021 #2
Yep. Yavin4 Jul 2021 #4
Except Sputnik V... nt Shermann Jul 2021 #11
What about the US? ecstatic Jul 2021 #6
People generally test when there is a reason to test. Ms. Toad Jul 2021 #7
7 million people? Yavin4 Jul 2021 #10
A representative sample must still proportionately mirror the population. Ms. Toad Jul 2021 #17
It's like extrapolating STD positivity rates to the general population NickB79 Jul 2021 #19
Perfect example. Ms. Toad Jul 2021 #22
You are correct, but it's not the same as extrapolating STD positivity rates. Yavin4 Jul 2021 #29
They test over 1 million people a day. Yavin4 Jul 2021 #21
Quantity means next to NOTHING Ms. Toad Jul 2021 #25
You are making assumptions without any evidence or facts. Yavin4 Jul 2021 #28
You are the one asserting a number that cannot be supported Ms. Toad Jul 2021 #30
When you talk about 7 million and not 700 Tribetime Jul 2021 #32
Quantity only alters the need for representation Ms. Toad Jul 2021 #33
The UK leads all of Europe in the number of tests done. Yavin4 Jul 2021 #35
Completely irrelevant - Ms. Toad Jul 2021 #38
Numerous people in this thread simply do not understand basic statistics. Yavin4 Jul 2021 #43
No; schools do a lot of regular tests, and so do care homes and hospitals muriel_volestrangler Jul 2021 #45
Schools, care homes, people attending Wimbeldon, people traveling abroad, people coming home Yavin4 Jul 2021 #48
Wrong. I think you are now winding us up, but it isn't funny. muriel_volestrangler Jul 2021 #54
You are the one who does not understand basic statistics, Ms. Toad Jul 2021 #46
You need to show me where the bias is in the sample. Yavin4 Jul 2021 #47
A sample is presumed to be biased, Ms. Toad Jul 2021 #50
There's a historic example for this BGBD Jul 2021 #36
Yup. But that point isn't getting across, for some reason. n/t Ms. Toad Jul 2021 #39
Election sampling is completely different Yavin4 Jul 2021 #42
Then explain the mix - Ms. Toad Jul 2021 #51
In the well vaccinated Seattle area, cases are dropping ismnotwasm Jul 2021 #14
Probably already said Dave says Jul 2021 #15
Many sick are getting tested IbogaProject Jul 2021 #23
People in the UK get tested for a variety of reasons. Yavin4 Jul 2021 #26
At that rate, the entire nation would be infected in 40 days NickB79 Jul 2021 #24
Not quite. Yavin4 Jul 2021 #27
From the Office for National Statistics, it's about 257,000 at 26th June muriel_volestrangler Jul 2021 #40
Please show me evidence that the 7 million tests per week were NOT at random. Yavin4 Jul 2021 #41
Sure, though everyone has already explained that tests are mostly done for a purpose muriel_volestrangler Jul 2021 #44
Again, I'm using your governments official daily reports which when they support the groupthink here Yavin4 Jul 2021 #49
The point is that you are making an unjustified assumption that all the people tested muriel_volestrangler Jul 2021 #53
I got pretty darn close just estimating the testing mix and estimated positivity within the groups Ms. Toad Jul 2021 #52
Fine. I will use your estimate of 600K positive cases Yavin4 Jul 2021 #55
Latest Discussions»General Discussion»The UK currently has 1,65...»Reply #37