My school's results went down in the last two years.
The reason had nothing to do with student achievement. Zip. NCLB's effect on education was utterly irrelevant, class size differences and ed spending were meaningless.
The rules on how to tabulate the state test results changed. We had too many special ed kids taking modified tests. The excess number of kids were moved by the state into the "regular" category. Suddenly these regular kids had failed to take the regular state test and received a score of "0" (as far as the school's report card was concerned). Pitch in a bunch of 0s to the average and the aggregate scores go down. The numbers were also reported for each kid. Total them up and use the previous year's tabulation algorithm and scores had gone up.
Up or down? Depends on what you want to say. Some people immediately said that increased class size lowered the test scores. Others, that the increased numbers of minority students lowered the test scores. Yet another group asked how bad it would have been if not for Obama, while others blamed Obama. All these groups had their heads up their asses. Perhaps up each other's asses. Hard to know.
However, the test scores went up because the school decided to take several weeks doing nothing but standardized test review just before the test was administered. This means that you really have trouble comparing the results in any meaningful way with previous years, when they *didn't* do that many weeks test review.
In other words, state tests are meaningless, long-term. They change how they score the results. They change the questions. They change tests or renorm them. The observer's paradox holds: By imposing the test, the states alter instruction and schools really do teach what is on the test. The more a school is likely to fail AYP, the more intense this is going to be.
The test you want to understand is the NAEP. It's national, given for reading and math, with a large enough sample to be useful (both urban and other, for all states). Nothing a school does--graduation, federal funding, state funding--depends on NAEP results. At best, it has a trivial effect on instruction. The test's methodology varies little over time, the only secular problem is that the student population's characteristics change. But if you want to know how "students" do, either as a group, or broken down by race it's the gold standard. (The race breakdown is rather crude, but they keep the methodology what it was in the early '70s). http://nces.ed.gov/nationsreportcard/about/
The result: Overall long-term trend of increasing test scores, for math and reading. They mostly flatlined in the '90s. They increased in the '00s. Some test years they may go up or down slightly, but the key word is "slightly." Over 40 years the change is decent; in any given 4-year period, it's slight. (They revised their methodology slightly mid-BushII, so there's a discontinuity.)
Averages easily lie. If 10% of students have a 10% increase and 10% have a 10% decrease, it looks flat. If the bottom 10% increase 10% and the rest increase slightly, the achievement of the lowest 10% is masked. So it pays to dig in a bit, find out what some of the other measures necessary to understand a simple average (things like measures of dispersion) are. Or you can breakdown the stats by group and see the individual cohorts' achievement.
http://blog.ednewscolorado.org/2010/03/30/naep-score-trends-not-so-flat-after-all does so a little.