Over the last few years, the Education Endowment Foundation has funded a number of unusually high quality studies of educational programs. I am interested in the possibility of predicting the outcomes of these studies – if the outcomes can be predicted accurately, then it may be possible to save a lot of money studying unpromising programs (more context).
I chose to focus on reading programs – defined by me as “any program evaluated using a test that included reading comprehension” – in order to keep the task to a manageable size.
I identified 29 such programs with completed evaluations. You can find a few details about them with links to their evaluations here.
My findings were:
- To a reasonable approximation, a program that incorporates many strategies from the reading teaching literature is more effective than one that incorporates few.
- Despite the indication that maximising the number of reading teaching strategies is the way to go, most programmes tended to focus either on decoding strategies or on comprehension strategies. Only one programme included both, and it reported very strong results.
- With the exception of oral language skills, programs that aim to improve reading skills by means not directly related to teaching reading (e.g. by parental engagement or instruction in unrelated curricula) do not do so.
Features of Effective Reading Programs
I picked a list of “evidence based” strategies for reading teaching by
- Thinking of everything I remembered reading about for about 5 minutes
- Asking two of my coworkers to do the same
I tried to focus on things that were mentioned to help teach reading. This included many reading-specific items, but also a few items that were considered to be effective for teaching in general (“meta-cognitive strategies”, 1 on 1 or small group interaction, holiday programs). The full list of elements I came up with was:
- Phonics instruction (sound-letter correspondences)
- Sound blending
- Segmenting words into sounds
- Phonological awareness assessment and instruction
- Aiming to improve reading fluency
- Instruction in summarising texts
- Instruction in predicting events in texts
- Instruction in questioning strategy
- Instruction in clarifying meaning
- Activating prior knowledge
- Instruction in inference strategy
- Teacher modeled self-talk and/or dialogue
- Building students capacity for self-monitoring
- Students set and reflect on goals
- Instruction in strategy recall and choice
- Holiday program
- 1 on 1 or small group instruction
A significant oversight was the omission of vocabulary building. I didn’t correct this error as it would involve re-annotating every program.
I did not include “giving feedback” even though it qualifies among the set of things that are considered to be generally helpful teaching practices. My reasoning was that almost every program probably included something that could be considered “feedback”, and it wasn’t clear to me that those that advertised feedback were in fact giving any more or any better feedback than others. In hindsight, I think I should’ve included this as well.
The most significant problem with the list is that I came up with it after reading many of the evaluations I set out to investigate. I did not aim to create a list of features of the successful programs as such, but I am sure that my knowledge of the results influenced its creation. The final results are therefore going to be biased in favour of predictability; I think the result is strong enough to stand in spite of this.
I have included my judgments of program features in the spreadsheet linked above. My judgements typically came from reading the project report and any other program information available on the web, but I did not link each judgement back to its source. I imagine if I had instead relied on interviews with the creators of each program I would have assigned many more evidence based practices to each one. I’m not sure that would have been a good thing overall. I have also produced a sheet of predictions of forthcoming trials.
Exploring the Programs
When we look at which elements were included in each program, we find that there were five kinds of programs investigated. First, the “indirect” programs which included no reading instruction strategies and attempted to improve reading comprehension via some other means. Of the programs with a strong focus on literacy there were “phonics – heavy” and “talk – heavy” programs – the first stressed systematically teaching the process of turning text into words, and the second stressed talking about texts. Finally, there was one comprehensive program that included many elements of phonics-heavy and talk-heavy instruction, and a number of “bits and pieces” programs that had a few elements of phonics and/or talk instruction.
We can see already that programs that included more strategies (black squared) achieved larger effects (bluer squares):

The programs were divided into five groups on the basis of the observations described above:
Comprehensive
One program included most of the features identified – REACH + comprehension. It mainly lacked the general purpose metacognitive strategies.
This program achieved the highest effect size of all tested.
Talk + comprehension
This cluster tended to focus on establishing rich dialogue and building skills through talking. They also featured the application of “comprehension strategies” such as clarifying and questioning. Programs in this category often made a point of discussing the importance of teachers modelling dialogue and self-talk, with students imitating or reciprocating and gradually building their ability to do this independently. This is a very common teaching strategy in a broad sense, but I felt it was notable that dialogue centred programs made more of a point of encouraging it than other kinds of programs.
Programs in this cluster usually had positive effect sizes that were somewhat smaller than phonics focused programs.
Phonics/phonics + comprehension skills
This cluster focused on building student’s skills for decoding and understanding texts. Phonics focused programs distinguished themselves from programs that merely included phonics by featuring a larger number of phonics related strategies – they did not only include phonics (letter-sound relationships), but blending and segmenting strategies to enable students to put their phonics knowledge to use. The more complete approach to phonics instruction may be related to “synthetic” phonics, versus “analytic” phonics that may include fewer strategies.
Some of these programs also featured instruction in comprehension skills, though they usually featured fewer of these than the ‘talk’ cluster.
Phonics programs usually had positive effect sizes that were somewhat larger than talk programs. I will note that in my view, talky programs have a better chance of improving students skills beyond tests of reading comprehension.
Bits and pieces
These programs generally did not include broad selections of phonics, dialogue or comprehension strategies, but featured some.
Their effect sizes were quite mixed. This may represent information that isn’t being effectively captured in my breakdown.
Indirect
These programs aimed to improve reading via some channel other than explicit instruction in reading skills – for example, “Rhythm for Reading” focused on teaching rhythm production and music reading skills.
These programs nearly all had null effects; the one exception was a holiday program which encouraged students to read but didn’t appear to feature any strategy instruction.
Predictive Model
My aim is to create a model that predicts the effectiveness of a reading program. There were three approaches to modelling program effectiveness I considered. First, to construct a linear model where the effectiveness of program depended on each feature present in the program:
Here is the effectiveness of program
,
is the amount we expect a program to improve if feature
is present and
is an indicator for whether feature
is in program
.
Given that there were only 29 datapoints, the model above with 18 parameters is likely to be too complex. A simpler model assumed that the program effectiveness depended only on the number of evidence based strategies present:
This model featured only 2 parameters. A final model of intermediate complexity was produced by grouping the features as ‘decoding’ (phonics, blending, segmenting, fluency and phonological awareness), ‘comprehension strategies’ (summarising, predicting, questioning, clarifying, inference), ‘metacognition strategies’ (goal setting, self-monitoring, strategy selection) and ‘everything else’. This model had five parameters.
In leave-one-out cross validation, the 2 parameter model accounted for 40% of the variance in program outcomes while the 18 parameter model accounted for only 23%. The five parameter model accounted for 53% of the variance, and dropping the three features least correlated with effectiveness – self-monitoring, strategy selection and goal setting – also accounted for 53% of the variance in the trial outcomes.
The results of the different models are plotted here for comparison. For each point, the x-value is the predicted effect from a model trained on the rest of the data, and the y-value is the actual effect size observed.

Discussion
By limiting myself to EEF funded studies, I am dealing with a much smaller sample of studies than much existing work, but these studies are also significantly less likely to be biased than education literature at large. Acknowledging that producing the list of important features after reading a number of evaluations probably means that the effectiveness is less predictable than indicated here, the results are strong enough that I think that the effectiveness is nonetheless predictable.
The findings here also suggest that there is a blind spot in reading education – namely, combined instruction in oral language and phonics. We saw that almost all programs that did anything at all focussed on one or the other.
This blind spot is also present in significant literature on reading instruction. A literature review by the National Reading Panel in the US which apparently covered 100 000 studies (gosh!) gives the following summary of important elements of reading teaching:
- Explicit instruction in phonemic awareness
- Systematic phonics instruction
- Methods to improve fluency
- Ways to enhance comprehension
The NSW Centre for Education Statistics and Evaluation lists both fluency and vocabulary as important elements of reading instruction, and omits oral language.
This tendency is not universal. The EEF itself produced recommendations most closely in line with mine – that it is important to teach oral language, phonics and comprehension strategies..
I think the areas of disagreement are meaningful. If the importance of oral language is not just an artifact of the EEF’s dataset – and it could be – then it would seem to be a significant oversight by reviews that do not mention it.
I also think that a novel implication of this work is that the “checklist” approach to designing or selecting a literacy instruction program could potentially be very productive. That is, listing evidence based literacy teaching strategies and designing or selecting a program that includes as many of them as possible could be a very productive strategy that, as far as I am aware, is not often adopted.
If you want to try investigate the model mentioned in this post, it’s implemented here. Have fun!
