Lessons from Science on How to Test your Code

Graham Jenson
Maori Geek
Published in
5 min readMar 30, 2014

--

The motto for the Royal Society is Nullius in verba, which translates to Take nobody’s word for it. This comes from a scientific culture of scepticism that does not care how renowned, well known, popular, or respected someone is, you should always demand evidence for any claim put forward.

I have been thinking about how scientific ideas like this are analogous in software engineering when writing tests. These days it is common to write tests to demonstrate that your code is working, no matter how good a programmer you claim to be. I think this is the software community rediscovering that they should Take nobody’s word for it, not even your own.

I think there are more lessons from the world of science that could be used when testing software. So, in this post I am going to briefly explore the similarities and differences between science and testing software, and maybe provide a different way to look at your test suite.

Isaac Newton was an Awesome Tester

I am going to start this post with a story of Isaac Newton, the tester.

When Newton started writing tests for the universe in the 1670’s it was believed that a feature of the universe is that white light was colourless. An implication of this is that when white light is passed through a prism, the colors that appear are created by the prism, not the light itself. Newton could see this when he performed the test:

describe :prism do
it 'will create many colors with white light' do
white_light = light.new(:white)
Prism.new(white_light).should include? :red, :green, :blue
end
end

This tests passes and Newton probably thought everything was fine with the universe.

Newton realized that an implication of this colorless white light would be that when you passed just red light through the prism it should separate again into more colors. So Newton created a test to take the red light created by passing it through one prism, then passed it through another to see all the colors:

describe :prism do
it 'will create many colors with red light' do
white_light = light.new(:white)
red_light = Prism.new(white_light).get(:red)
Prism.new(red_light).should include? :red, :green, :blue
end
end

Newton probably though “Oh crap, the universe doesn’t implement this feature. Then what feature does it have?. Maybe the universe has the feature where the colors are inside the white light. This would mean that if we focus the colors with a lens we could get the white light back.”. So Newton went about writing a test for that:

describe :prism do
it 'will create white light with many colors' do
white_light = light.new(:white)
colored_light = Prism.new(white_light)
Prism.new(colored_light).should eq white_light
end
end

This test passed, and Newton discovered something about the system in which he was testing. He then published his test, and the implication it had, so others could see for themselves the feature about our universe that was just discovered.

Testing is Science

Richard Feynman once wrote that:

Experiment is the sole judge of scientific truth. The Feynman Lectures on Physics, Introduction, Richard Feynman, 1961.

An analogous idea for software testing is:

Repeatable tests are the sole judge of your systems functionality

Documentation, intuition, and people (even developers) can be wrong. However, your tests show and describe the truth about your system. If you have no tests, you have no evidence that your system is working the way it should be. What is the evidence that is demanded? It is the same in both domains, repeatable tests.

But Software Systems Change

We are like sailors who on the open sea must reconstruct their ship but are never able to start afresh from the bottom. Willard Van Orman Quine, Word and Object, 1960

The above quote describes the Neurathian bootstrap (named after Otto Neurath) which describes the nature of scientific verification and identity. To me, it sounds more like my day job, where I am maintaining and developing systems while having to have them in production and working. I am sailing a ship and rebuilding it at the same time.

The biggest difference between software development and science is that science is trying to discover a static set of knowledge about the Universe, where developers are trying to alter their systems while ensuring all previous knowledge remains true. For example, an experiment by Newton in the 17th century will still return the same result today, but the tests that I wrote last week broke with the latest feature.

This lack of change means that scientists do not need to run their experiments again and again to ensure they have a working universe and developers continuously run our tests because we are always breaking them. This is why scientists put much more weight on experimental accuracy, more than what developers do at least.

Scientists Started with Integration Tests

Imagine showing up to a job where you had to maintain a mission critical system that has no tests, no documentation and everyone who knew about it left (probably for the lack of tests and documentation). Where would you start? I would start by writing integration tests, then moving downward towards unit tests.

This is the same problem people had when we showed up here. We got this world without a manual or specification. We started with big integration tests, things that tested behaviours of the system like how to start a fire. Slowly we have moved downwards towards more and more specific unit tests, like how friction is caused.

Which tests are more profound? It is always the unit tests! Gravity, special relativity, quantum mechanics all fundamentally altered the way in which we view our world. The things that impact our understanding of the whole system are the tests that are fundamental to many things. Where is the unit test for your system that explains how all the models move relative to one another, you know, the test for gravity?

Implications of the Hypothesis “Testing code is Science”

What I cannot create, I do not understand Richard Feynman, 1988

What are the consequences of thinking about testing using this scientific metaphor? Will you write your tests with the rigour of scientists, and if they fail will you see your universe as broken.

If you ignore your tests, with things like “Oh that test always fails” you are ignoring the only good evidence that your system works. It is like those people who only accept some of the scientific evidence, then derive conclusions from that, while ignoring the stuff that conflicts with their view (I am looking at you, young earth creationists).

If you say “this javascript doesn’t need tests, it is too abstract and my unit tests are still working”. I see this as like a Biologist saying “the fundamental chemical reactions are well tested by chemists, so I don’t need to experiment”. No matter how abstract you are from the core, not having tests means you don’t know if it is doing what you think it should.

Conclusions

When I program sometimes I think I am the master of this small, simple universe that I created. Someone, someday may come across my universe and if that happens I would like to leave some evidence of how it works. A little bit of the science about my universe written as tests.

I am not saying this analogy holds up under microscopic inspection, I think it breaks down somewhere around TDD and peer review. I am just saying that it is fun to imagine your tests as tiny scientists experimenting on your system, making sure everything is right with their universe.

Read More

Surely You’re Joking, Mr. Feynman! — Richard P. Feynman

Cosmos: Carl Sagan

Applying the scientific method to software testing

--

--