College Board · Chief Reader

AP Statistics Chief Reader ReportsWhat Examiners Actually Want

The post exam reports in which the Chief Reader describes how students performed on every free response question, plus a multi year synthesis of the language, reasoning, and procedural patterns that separate Essentially Correct from Partially Correct.

AP Statistics Chief Reader Report archive (2022 to 2025)

Type
Year

5 of 5 resources

2024

1 file
  • 2024 AP Statistics Chief Reader Report

    Chief Reader Report · official archive

    Open PDF

2023

1 file
  • 2023 AP Statistics Chief Reader Report

    Chief Reader Report · official archive

    Open PDF

2022

1 file
  • 2022 AP Statistics Chief Reader Report

    Chief Reader Report · official archive

    Open PDF

2019

1 file
  • 2019 AP Statistics Chief Reader Report

    Chief Reader Report · official archive

    Open PDF

2018 and earlier

1 file
  • Pre 2019 AP Statistics Chief Reader Reports (legacy archive)

    Chief Reader Report · official archive

    Open PDF

Post exam analysis of student FRQ and Investigative Task responses

What it is

The AP Statistics Chief Reader

Written by

Late summer after the May exam

Published

Every FRQ and the Investigative Task: what earned points, what did not

Covers

Most candid public guide to where inference points are lost

Best use

2019, 2022, 2023, and 2024 reports

Synthesized here

What do AP Statistics Chief Reader Reports reveal?

The precise language and reasoning distinctions that separate Essentially Correct from Partially Correct, question by question across every administration.

After every May exam, the AP Statistics Chief Reader publishes a report that analyzes student performance on each free response question, including the Investigative Task. The report describes, from the perspective of a Reader who has scored hundreds of thousands of actual responses, which phrasings earned points, which fell one clause short of full credit, and which procedural patterns cost students points across every question type. Because AP Statistics is as much a language as a body of content, with precise distinctions between a parameter and a statistic, between stating a condition and checking it, and between a conclusion framed in context and one that is not, the Chief Reader Report is unusually specific. It does not describe generic study habits. It describes the exact sentence constructions that cross the Essentially Correct threshold and the exact omissions that land responses in the Partially Correct category. For a student preparing for the exam, reading recent Chief Reader Reports alongside that year's free response booklet and scoring guideline provides the clearest available picture of how examiners think.

Multi year synthesis: the persistent themes

Across the 2019, 2022, 2023, and 2024 AP Statistics Chief Reader Reports, five themes recur with striking consistency and none of them is primarily about missing statistical content. First, the precision of statistical language at the conclusion step is the single most persistent differentiator between full credit and partial credit across all four years. The 2022 report notes that students who wrote conclusions using probability language for fixed parameters, saying there is a 95% probability rather than we are 95% confident, or who concluded we know rather than there is sufficient evidence to conclude, were scored as Partially Correct on the conclusion component even when the inference procedure itself was executed flawlessly. The same distinction appears in the 2023 and 2024 reports, framed almost identically. Second, conditions checking is documented across every report as an area where students state a condition without demonstrating that the specific study satisfies it. Readers across the 2019, 2022, and 2023 administrations consistently note that writing the sample is random is treated as stating the condition, while writing the problem states that a simple random sample was taken, so the random condition is met is treated as checking it, and only the latter earns full credit on the conditions component. Third, the Investigative Task is flagged in the 2022, 2023, and 2024 reports as the question that most clearly separates students who build a statistical argument across parts from those who treat each sub part in isolation. The Chief Reader for the 2023 administration specifically observed that the strongest Investigative Task responses used findings from earlier parts to constrain or inform conclusions in later parts, while the weakest responses showed no connection between parts even when the individual analyses within each part were correct. Fourth, notation precision, specifically the use of the sample statistic symbol where the population parameter symbol is required, appears in every report. Writing the null hypothesis with p hat rather than p, or stating the conclusion for x bar rather than mu, is flagged in the 2019 and 2022 reports as a notation error that costs students communication credit and, in some rubric structures, the procedure point as well. Fifth, in questions involving simulation and probability, the 2022 and 2024 reports document that students describe the simulation setup informally, for example roll a die many times, without specifying the number of trials, the outcome that constitutes a success, or how the simulated results address the question being answered. The Chief Reader notes that this informality is exactly why such responses earn partial rather than full credit on simulation sub parts.

Top student errors documented in recent reports

  1. 01

    Probability language used for fixed population parameters

    Across the 2022, 2023, and 2024 administrations, Readers consistently observe that students write there is a 95% probability that the true proportion falls in this interval rather than we are 95% confident. From the examiner's perspective, this is a conceptual error, not a wording preference: a confidence interval describes what a method produces across repeated samples, not the probability that a single fixed parameter lies in a particular range. Responses using probability language for confidence intervals are scored as Partially Correct on the conclusion component regardless of how correctly the interval was computed. The same finding appears in the 2019 report, making this a multi year stable pattern rather than a quirk of one year's questions.

    AP Statistics Chief Reader Reports 2019, 2022, 2023, 2024

  2. 02

    Conditions stated but not checked against the specific study context

    Readers across the 2019, 2022, and 2023 reports distinguish between stating that the random condition must be met and demonstrating that the specific study described in the problem satisfies it. From the examiner's perspective, writing the sample is random is an assertion; writing the problem states that a simple random sample was selected, so the random condition is satisfied is a verification. The rubric credits the verification, not the assertion. The 2023 report notes this as the most common source of lost points on conditions components across all inference question types.

    AP Statistics Chief Reader Reports 2019, 2022, 2023

  3. 03

    Investigative Task parts treated as isolated rather than cumulative

    The 2022, 2023, and 2024 Chief Reader Reports each identify the ability to connect findings across Investigative Task parts as the primary differentiator between a score of 3 and a score of 4 on that question. Readers observe that students who perform each sub part correctly in isolation but fail to carry conclusions from earlier parts into the reasoning of later parts earn lower holistic scores than students whose later parts explicitly reference and build on earlier findings. The examiner perspective is that statistical argumentation is cumulative: the Investigative Task is structured to reward integrated reasoning, not five independent mini answers.

    AP Statistics Chief Reader Reports 2022, 2023, 2024

  4. 04

    Hypothesis statements written for sample statistics rather than population parameters

    The 2019 and 2022 reports both document that null and alternative hypotheses written with sample statistic notation, using p hat instead of p, or x bar instead of mu, are treated as notation errors that signal conceptual confusion about what inference procedures are designed to do. From the examiner's perspective, a hypothesis test makes a claim about the population, and writing the hypothesis for the sample statistic reflects a misunderstanding of the inference framework itself, not merely a symbol substitution. This error appears on both one sample and two sample procedures.

    AP Statistics Chief Reader Reports 2019, 2022

  5. 05

    Chi square test type selected by calculation rather than study design

    Readers across the 2022 and 2023 administrations observe that students who correctly compute a chi square statistic and correctly identify it as significant frequently select the wrong chi square test type because they focus on the calculation rather than the study design. The examiner's view is that the test for homogeneity applies when two or more independent groups are each sampled and compared on one categorical variable, while the test for independence applies when one sample is observed on two categorical variables simultaneously. Students who cannot identify which design was used before calculating are flagged in the report as having a structural gap in their understanding, even when their arithmetic is correct.

    AP Statistics Chief Reader Reports 2022, 2023

  6. 06

    Simulation procedures described without specifying the number of trials, the success outcome, or the connection to the question

    The 2022 and 2024 reports document that simulation sub parts frequently earn partial rather than full credit because students describe the simulation mechanism informally without the three components Readers require: how many trials will be run, what outcome in a single trial constitutes a success, and how the simulated results answer the stated question. Readers observe that phrases such as repeat many times or do this a lot of times appear frequently and are consistently treated as incomplete. The Chief Reader notes that a complete simulation description must be specific enough that someone else could implement it and get the same result.

    AP Statistics Chief Reader Reports 2022, 2024

What do AP Statistics Readers consistently reward?

Statistical conclusions written in full context, with precise parameter language, linked to the specific study design described in the prompt.

The Chief Reader Reports from 2019 through 2024 describe high scoring responses in terms that are remarkably consistent across administrations. On inference questions, the responses that earn Essentially Correct on the conclusion component write the true mean score for this specific population, not just mu. They write there is sufficient evidence to conclude rather than we know. On conditions, they write the problem states a simple random sample was selected, so the random condition is met, not simply the random condition is met. On the Investigative Task, they build an argument: the conclusion in part (c) refers back to the evidence established in parts (a) and (b). The reports also describe what Readers reward on probability and simulation questions: a complete specification of the simulation setup that a second person could replicate, an explicit definition of what constitutes a success in a single trial, and a clear statement connecting the simulation outcome to the probability being estimated. These patterns appear across question types and across years. They reflect Skill 4, Statistical Argumentation, which the AP Statistics Course and Exam Description identifies as the skill assessed on virtually every free response question.

How should AP Statistics students use the Chief Reader Reports?

Read three recent reports back to back, extract the findings that appear in all three, and build those findings into a checklist you apply to every practice response before scoring.

The Chief Reader Reports for AP Statistics are shorter and more formulaic than those for laboratory sciences, which makes them unusually efficient to read. A motivated student can read the reports for 2022, 2023, and 2024 in under two hours. The key move is to read all three before taking notes, then identify the findings that appear in all three. Those stable, multi year findings represent the highest leverage things to address in practice, because they are structural habits of response writing, not reactions to a single year's unusual questions. The checklist in the following section synthesizes those stable findings into eight actionable items. The most direct use of the reports is to write a complete practice response to a released question, then score it using both the official scoring guideline and the checklist derived from the Chief Reader's recurring observations. That two pass scoring reveals, in one practice session, both whether the procedure was correct and whether the language and reasoning would earn full credit from a Reader.

The Chief Reader checklist

  1. 1

    Write every confidence interval conclusion using the phrase we are X% confident that the true [parameter] for [population] is between [lower] and [upper]. Never use probability language for a fixed population parameter.

  2. 2

    Write every significance test conclusion using there is sufficient evidence to conclude that [specific directional claim about the population parameter] at the [significance level] level. Follow with the specific context from the problem.

  3. 3

    For every condition, write two sentences: one stating the condition and one demonstrating that the specific study satisfies it, with a direct reference to the problem description. Stating the condition name alone does not earn the conditions point.

  4. 4

    Write all hypotheses for population parameters, using mu, p, mu1 minus mu2, or p1 minus p2, and define each parameter in a sentence that names the population. Sample statistic notation in a hypothesis is treated as a notation error.

  5. 5

    On the Investigative Task, read all parts before writing any of them. Use the findings from earlier parts explicitly when answering later parts. Refer back to a numerical result from part (a) when justifying a conclusion in part (c).

  6. 6

    For simulation questions, write three components explicitly: the number of trials, what outcome in a single trial counts as a success, and how the proportion of successes from the simulated trials answers the probability question asked.

  7. 7

    When choosing a chi square test, identify the study design first: one sample observed on two categorical variables calls for the test for independence; separate samples from different groups compared on one categorical variable calls for the test for homogeneity. Make this identification explicit in your response.

  8. 8

    Show calculator setup before reporting calculator output. For t procedures and chi square tests, write the test statistic formula with substituted values before presenting the result from the calculator. Calculator output alone is often scored as missing the setup component.

AP Statistics Chief Reader Report FAQ

What is the AP Statistics Chief Reader Report?

After each May AP Statistics exam, the Chief Reader publishes a report analyzing how students performed on every free response question, including the Investigative Task. The report describes what successful responses included, the reasoning errors that cost points, and the language patterns that separated Essentially Correct from Partially Correct on each component. It is the most candid public account of how Readers actually apply the scoring rubric.

Where can I find AP Statistics Chief Reader Reports?

This page links to the College Board official past exam archive at apcentral.collegeboard.org, which hosts AP Statistics Chief Reader Reports alongside the corresponding free response booklets and scoring guidelines. Reports in the current format are available for 2019, 2022, 2023, and 2024, with earlier years accessible through the same archive hub.

What is the most common error documented in AP Statistics Chief Reader Reports?

Using probability language for a fixed population parameter in a confidence interval conclusion. Across the 2019, 2022, 2023, and 2024 reports, Readers consistently note that writing there is a 95% probability that the true proportion falls in this interval is treated as Partially Correct because a confidence interval describes a method applied over repeated samples, not the probability that a single fixed parameter lies in a specific range. The correct phrasing is we are 95% confident.

What do AP Statistics examiners consistently reward?

Statistical conclusions written in full context with precise parameter language, conditions checked against the specific study design described in the problem, Investigative Task parts that build on each other rather than standing independently, and simulation descriptions specific enough that a second person could implement them and get the same result. These patterns appear in every recent Chief Reader Report as the markers of high scoring responses.

How does the Investigative Task differ from standard FRQs in the Chief Reader Reports?

The Chief Reader Reports for 2022, 2023, and 2024 each treat the Investigative Task separately and flag a specific skill that standard FRQs do not require: building a statistical argument across parts. Readers award higher scores to responses that carry findings from early parts into the reasoning of later parts, not merely completing each sub part correctly in isolation. This cumulative reasoning is the primary differentiator between a score of 3 and a score of 4 on the Investigative Task.

Why does the Chief Reader Report matter more for AP Statistics than for other AP exams?

AP Statistics is unusual in that a large share of points on inference questions depends on precise language choices rather than on calculation accuracy. A student can execute a t test perfectly and still earn Partially Correct on the conclusion by writing we know instead of there is sufficient evidence to conclude, or by omitting the population and context from the conclusion sentence. The Chief Reader Report is the only public document that describes, at the level of individual clauses, which phrasings earn points and which do not.

Do AP Statistics Chief Reader Reports document performance trends over time?

Yes. The reports reference score distributions and, in aggregate, show that the mean score rose modestly from approximately 2.80 in 2022 to 2.87 in 2024 per College Board score distribution data, while the pass rate moved from about 60% to 61.6%. The Chief Readers across this period note that the core language and reasoning errors remain present despite the overall improvement, which indicates that scores are rising at the margins rather than through elimination of the structural habits the reports flag.

How should I use the AP Statistics Chief Reader Report in practice?

Read the reports for 2022, 2023, and 2024 consecutively before taking notes. Identify the findings that appear in all three, because those are structural habits rather than reactions to a single year's questions. Convert those stable findings into a checklist. After writing a practice response, score it twice: once against the official scoring guideline for that year, and once against the checklist. The two pass scoring reveals both procedural accuracy and language precision in a single practice session.

What do AP Statistics Chief Reader Reports say about checking conditions for inference?

The 2019, 2022, and 2023 reports each distinguish between stating a condition and checking it. Writing the random condition must be met is a statement; writing the problem specifies that subjects were randomly assigned to treatments, so the random condition is satisfied is a verification. Readers score the verification, not the statement. The reports note this as the most common source of lost points on conditions components across all inference question types.

How do AP Statistics Chief Reader Reports relate to the scoring guidelines?

The scoring guideline specifies the rubric: the exact components each part can earn and what a response must include to earn each one. The Chief Reader Report explains how students actually performed against that rubric across the full population of responses and why specific response patterns earned partial rather than full credit. Reading the scoring guideline shows what is required; reading the Chief Reader Report shows why responses fell short of it. Both should be read together with the free response booklet for the same year.

More AP Statistics resources

Train on what AP Statistics examiners actually reward

An AI tutor that works released AP Statistics FRQs and Investigative Task questions with you, scores your responses against official College Board rubrics, and flags the language precision errors documented in Chief Reader Reports.

Start free with Tutorioo