21 jun 2019 How can we test a method
Recently, a colleague from work introduced me to a discussion happening on the internet about whether could really prove a method works or not, or works better than another one. This discussion was very well presented in Luiz Otavio Barros’ blog. At this point, we have to turn to empirical analysis – how it is done and the data it produces. Data collection is too hard work to be ignored. We usually trust empirical data with much more decisive matters – even our lives when we take medicine – why should we not trust empirical evidence that supports methods?
Testing the actual effect of anything on human beings is complicated by nature – there are so many individual variations that any test could be prone to failure. These variations can be of many natures; they can regard previous conditions (in the case of ELT, previous linguistic knowledge), attitude towards what is being tested, attitude towards the testing conditions (which include, in our case, the teacher), physical and psychological conditions on the day of the test, among others. In this scenario, how can we test and trust that a method works?
There are many procedures that maximize the reliability of experimental data, such as the control of variables, testing conditions, testing materials, documentation, pre and post-tests and statistical analysis. These procedures are worth a whole course and each of them attacks one complicating factor in the topic studied and in the experimentation process itself. They are all equally valuable, but I will focus on two: pre and post-tests and statistics.
Every teacher has taught a lesson that ended with some students producing the target language beautifully while others seemed that barely had been there. Of course, it can happen because some students grasped the content more easily than others for any reason, but it can also be due to different starting points – some students already knew something that the others did not. How can researchers account for these while experimenting? It is quite simple. They do not analyze how students were performing after the lesson. They analyze the difference between students’ performance before the test and after the test. As a result, how much they already knew beforehand is not that important. Imagine you have two students to teach the present perfect to and you want to know which helps you more, inductive or deductive approach. One student is taught using the inductive approach and scores 7 on a test after the lesson. The other one is taught using the deductive approach and scores 9 on the same test. It seems to mean that the deductive approach worked better. However, it is not necessarily the case. If the first student had scored 3 on a test taken before the lesson and the second student had scored 8 on the same test, it would mean that the inductive lesson helped the student increase their performance in 5 marks while the deductive lesson helped the other student gain 1 mark. Then, students’ development triggered by the lesson represents a brutal difference in favor of the inductive approach. This developmental criterion is usually what is taken into account when carrying out teaching experiments. The pre-tests and post-tests are responsible for measuring students’ development and are found in most quantitative research on teaching (qualitative methods are being the scope of this text).
Of course, this oversimplified example mentions experimental and control groups of one person each and, in that situation, many individual features could contribute to the result. That is why experimental groups are frequently made up of tens of participants. All the personal variations, such as personal relations with peers and the teacher, personal attitude towards the topic and approach, unpredictable factors such as students’ level of focus on that particular day, etc, are diluted in the number of participants.
Another factor that is related to the number of participants and is the most decisive key to relying on the result of experiments is statistics. Just like testing medicines, testing methods and other teaching-related topics make use of sophisticated statistical analyses that allow not only to describe the results of a test but also to say that this will be true most of the times the same scenario is reproduced. In fact, many statistical concepts and techniques used in health studies are also used in teaching studies, such as ANOVA and p-value.
Obviously, no study is reliable if it is not analyzed by a body of experts who take a critical stance over the tests carried out. That is why it is important that we teachers are able to understand the principles of sound research and how we can critically bring its results to our daily practice, considering our students and their differences and similarities to the context of the piece of research. Understanding and trusting research do not mean to believe in one-size-fits-all teaching.
Even being aware that experimental studies on methods and teaching in general are trustworthy, at the end of the day the question is how we can know for sure a method will work with our students. My starting point would be what is tested and proven, but just like some people might not respond well to a given medicine, some people might not respond well to a given teaching method. If this is the case, it is the teacher’s job to diagnose it and prescribe a different approach to help the student.