Free Code Coverage
TL;DR
I wrote a new GitHub Action (SamuelCabralCruz/free-code-coverage) to reproduce behaviour of some popular code coverage utility tools such as codecov.io, coveralls.io, and others but free of charges (except the cost generated by owning a S3 bucket. At the moment, it still is only a simple Proof-of-Concept (PoC) standardizing ideas brought by this article. Keep in mind that code quality is not at its best because I didn’t want to invest too much time and energy in something only me would use. Depending on the attraction, I might rewrite a more stable and fail proof version.
How It All Begins
One too many times again, I fall into an argument with some of my colleagues about the right way to use code coverage metrics inside our CI/CD pipelines.
At my workplace, the motto is to enforce 100% coverage. No matter how questionable it is, there are still good reasons why they ended up doing so. Since most of the projects are written in TypeScript with Jest and that we don’t want to waste the client’s money into non-necessary tools, we don’t have a lot of options on how to enforce code coverage. Enforcing 100% was the simple and secure GoTo. The only others options we have given those constraints are:
- Rely on people’s good intentions
- Decide on another threshold value (ex: 90%)
Downsides of Enforcing 100% Code Coverage
I my opinion, enforcing 100% code coverage is not a good practice. This will usually lead to more evils than goods.
Istanbul Ignore Comments
Whether you are using Istanbul or any other test coverage tool, these tools will usually allow you to surgically ignore files, branches, and/or statements that are not covered by your tests and still artificially produce a 100% code coverage metric. However, those comments are simple production code pollution by the tests and could one day become a debt if you have to move from one test coverage tool to another. The whole picture would be a less dramatic if at least there were a standard out there to tell any test coverage tool to ignore a given part of the code.
Useful Information and Design Choices Loss
At the moment we start ignoring part of the code with comments, we have no way to compute our real test coverage and no quick feedback loop on which parts of our code would benefit from better testing. To me, consciously deciding to or not to test a part of the code is as much a design choice as creating an abstraction or delegating responsibility to another component. Moreover, which tests should we include or not in the computation of this code coverage metric?
False Confidence Feeling
Overtime, we end up forgetting which parts of the code are covered and those that are not. We fall into the trap of thinking that everything has been thoroughly tested and we can be confident about the code we are shipping, but are we really? Yes of course! Otherwise our CI pipeline would have catch that code coverage is not 100%… A single test can generate 100% coverage of all code branches without making assertions for each of these branches.
Is This Really Different Than Relying On People Good Will?
Put into the wrong hands, the famous ignore comments could end up being use to speed things up temporarily by promising to come back once the whole thing will be delivered. But more often than not, once something reached the trunk it rapidly falls into the abysses until some bugs are opened and we then look surprised that these pieces of code could have passed through the review process without test coverage. It is not necessarily due to simple laziness or lack of skills, but the reality is that things are moving fast and we rapidly forget about those floating personal or collective reminders over new and juicy tasks we have to do. Also, if you are an adept of TDD, you could challenge the real value of these tests if they did not help to shape our design anyway. In the end, this boils down to relying on people good will anyway, but in addition to polluting our codebase with comments and losing useful information about our codebase’s health.
Coverage Tests Instead of Behavioural Tests
Last but not least, depending on how the code coverage metric is computed, this could promote bad practices such as writing tests at the wrong level of abstraction/scope. More often than not, people will use the coverage provided by the unit tests to compute their coverage because larger tests would not be restrictive enough. However, if having 100% of code coverage is a bad idea, having 100% of code coverage with unit tests is an idea for disaster. These tests in addition to being hard to write are also hard to maintain because they will have to change at the same rate as the code they are related to. Before going any further, let’s define what I consider as a coverage test against what I call a behavioural test.
First and foremost, to me, tests are design and documentation tools. This is why they have to be written before the production code to enhance their full potential. They also help me making sure that everything is working the way it should at least for the tested cases. The fact that they are automated greatly help with dependencies updates and refactoring. A test should never be changed unless the behaviour they are assessing is changed itself or delegated to another component. The fact that a test that has to change if the implementation of a component has changed is a really bad smell that should be considered seriously.
In the case of coverage tests, here I refer to tests only written to generate code coverage. These tests are often written unconsciously just because we want our goddamn CI pipeline to pass and ship our feature to production. Of course, an ignored comment could have also make the trick, but it might generate questions during the review process, so one better play safe and write those tests. More often than not, these tests will need to mock a lot of dependencies to guide the flow properly and will be, by definition, fragile. This fragility will give rise to high maintenance cost without bringing much value to the system. Before writing any test, we should always describe the behaviour we are validating on a user standpoint and ask ourselves if we are writing our tests at the right level of abstraction with the proper scope.
In short, writing tests in reaction to the need of reaching a given percentage of coverage is a no-no. We should always keep in mind that test code is still code that we will have to maintain over time and as many knows, when it comes to LoC (Lines of Code), less is more. Thus, we should aim to have just enough tests that it needs to ensure the application is working properly. Usually, if tests are written at the right scope and the design has been guided by tests, they should be easy to write and maintain while being short and expressive. By focusing on behaviours we should only have one test for each given behaviour and no test duplication.
Right Code Coverage Threshold and How to Compute It
So, what code coverage threshold should be enforced, if we don’t enforce 100%?
I can’t stress enough the point that all projects are unique and there is no such thing as the “Right Code Coverage Threshold”.
Code coverage metric is far from perfect and should only be used as fact instead of a goal. Some applications can be fully covered by a single test while not asserting that all the behaviours covered are working as expected.
In my opinion, asserting code coverage is a task that requires thinking as much as deciding on an architectural pattern. This is something that should be discussed during the review process. We should also have consider test’s architecture whenever we are talking about the architecture of our system.
The only acceptable application of including code coverage metric in the CI pipeline I have seen so far is to verify that a given branch won’t cause a decrease in code coverage. Even here, it should be easy to discard this check cause there might be some good reasons to accept such a decrease. Assuming that all projects initially starts with 100% of code coverage, this percentage will gradually decrease and stabilize to a value which is representative of the application’s true code coverage and which suits the needs of the project.
The biggest downside of this approach is that it usually requires to spend expensive subscription for tools such as codecov.io… Or do we?
New Era
Since GitHub Actions is getting so much attraction lately and its community is so prolific, I thought that someone would have already created a plug and play action that would solve all my problems. Surprisingly, I only stumped on this article which describe how this could be achieved using GitHub Action. Since I really enjoyed the idea of such an action, I decided to make draft version of it. Here it is: SamuelCabralCruz/free-code-coverage.
I already contains couple of variants, but the main idea remains the same. I wanted an action which would persist code coverage data between runs and across branch of my repository to make sure a given pull request would not cause a decrease in coverage. I wanted it to be cheap (“free”) and simple to use. Another important feature was the ability to easily bypass the check using pull request labels. Finally, I also wanted the gravy README Markdown badges to show-off my code coverage.
Due to some limitations, I ended up splitting the action in two parts which I lovely called Upload and Update. Upload would be in charge to persist code coverage data, generate badges, and perform checks while Update will clean up the persisted data and override base branch data by merged pull request head.
I assumed that users would use my action within a workflow which use pull requests and enforce branches to be up to date with trunk before merge.
Here is simple usage example, but I strongly recommend to go check the repository for more details.
At the time being, I already created some variants of my actions which are:
embedded
- Best suited for mono-repositories
- Will report check by failing or passing workflow it is embedded in
- Has the disadvantage of losing reactivity on pull request labeling
github-repo
- Totally free option
- Use a public GitHub repository for persisting code coverage data between runs instead of a S3 bucket
- Has the disadvantage of making all code coverage data publicly available (not only the badges)
- Need a Personal Access Token (PAT) with read/write access to both repositories
Next Steps
Of course, my action is far from perfect. Among the roadmap, I could definitely see the following upgrades:
- Using a subfolder within the same repository for persisting code coverage data. This could remove make using GitHub repositories a lot more interesting while enabling the use of a private repository and avoid the need for a PAT to be generated.
- Persist action configurations to avoid having to provide them twice (once for each part: upload and update). For the moment, configuration duplication still really low, but could become worse overtime.
Conclusion
I really hope this article and my action will help some of you. I am looking forward to receive feature requests and comments. Overall, I did learn couple of things on the following topics: Bash, GitHub Api and GitHub Actions distributed through a mono-repository, which makes it a win-win situation.