General Discussion
In reply to the discussion: The UK currently has 1,657,270 positive cases of COVID-19 [View all]Ms. Toad
(38,421 posts)and representation.
You keep focusing on quantity - and lots of tests does not make the tested group representative of the general population.
Let's make it concrete. Because the data isn't available, we'll need to make some reasonable assumptions:
Let's say:
half of the testing is done because people are symptomatic (40%) or were exposed (10%),
30% of the testing is due to travel (I think that's high - but just to put a concrete number to the reason you suggested is the reason people are testing), and
20% are people who have just randomly decided they needed a COVID 19 test for the heeck of it.
Further - of these groups:
those who have symptoms are most likely to be positive (let's say 5.9% - within the range of % positive tests I have been able to find for people who are testing because they are symptomatic),
and let's say roughly half as many positive tests (3%) for those who were exposed, but have no symptoms - they are likely to have a lower positive rate because they are not symptomatic - but higher the general population because they were exposed.
Those testing due to travel have probably been taking precautions - they are likely to be pretty low - let's say .42%, and those randomly testing probably haven't been taking precautions - so they are slightly higher - let's say .65%.
If you calculate it out using the numbers from your first post, that mix gives you the overall 2.5% positivity rate you calculated.
Now - moving to the population as a whole:
most people who are symptomatic are going to get tested. Let's say 80% got tested. That means that there are an additional 20% in the general population who have symptoms, but didn't bother to get tested. They will be positive at roughly the same rate as those exposed and tested.
Probably considerably fewer of those who were merely exposed got tested - so let's say that's 50% got tested - the other 50% didn't bother (but will be positive at roughly the same rate as the tested group).
If you need to be tested for travel, you would have been tested, so no additional predicted positives there
Which leaves everyone else (62,409,072 or so) represented by the 20% of the tested population who just likes to take tests - and who will presumably be positive at about the same rate as the randomly tested population.
That gives us:
211,330 positive tests among those who had symptoms and either got tested (2,865,492) - or should have (716,373) @ a 6% rate
17,193 positive tests among those who were exposed and either got tested (286,549) or should have but didn't (286549) - @ a 3% rate)
361 positive tests among those who were tested for travel - @ .42% positive
405,659 positive tests among those who were randomly tested (17193) or weren't (62,409,072)
That's a total of 634,543 positives - or predicted positives (not 1,657,270) because the representation in the sample did not match the distribution in the population and because the sample was self-selected in a way that generates an artificially high positivity rate.

I have tried to make these numbers as realistic as possible, but I am not asserting they are the actual numbers. The mix I have created has the same positivity as you calculated - BUT - when extended to the population as a whole - generates a number of presumed positive individuals about 40% of the number you suggested. The point of this is to demonstrate a concept you seem unable to grasp.
The positivity rates in the 4 groups are within the ranges I have been able to find. And I'm pretty sure my assumptions about the split in the general population relative to the sample tested are realistic - largely that most of the untested population will be represented by a small portion of those tested who were not being tested for any particular reason - and are thus far more likely to be negative. Obviously there may be more reasons to be tested - each of which would have its own characteristic positivity rate.
But the big point is that the mix matters. If your sample - regardless of how large it is - does not match the mix in the actual population, you cannot, with any statistical validity, calculate an overall percentage for the tested group and extend it to the population as a whole.
Because you are asserting the extension is valid, it is your obligation to demonstrate that (1) positivity rate is the same regardless of the reason for testing or (2) the mix in the tested group mirrors the population as a whole.
As to sample sizes being accurate as to Biden's approval rating - they pay people big bucks to ensure that their small sample is representative, the step you are completely ignoring when you calculate an overall positivity rating and try to extend it to the population as a whole.
