Writing tests is one of the most important activities to improve code quality. But it’s not that glamorous. It’s easier to see when things are going terribly wrong than to put in the effort to prevent those things from happening in the first place. As a result, developers don’t get the instant gratification that comes with writing tests.
But if you are trying to build a testing culture in your organization, you should identify and recognize good testers. In this article, we’ll map out a few of the types of testing champions in your organization. We’ll then show you how to pull that data using the Codecov API. Finally, we’ll present a script that you can use on your repositories to see who has been leading the charge to better coding practices.
All code snippets provided in this article are subject to the MIT license.
Types of testing champions
Let’s talk about the five kinds of testing champions you might have in your organization. These are the people who are dedicated to ensuring good testing practices. If you have a smaller team or company, there might be someone who encompasses multiple types of testing champion.
-
The Powerhouse
The powerhouse is the engineer in your organization that is writing the most PRs that also increase coverage. They are responsible for a lot of the code written in the codebase and are disciplined enough to test that code. They are likely finding and fixing many of the bugs that pop up in production.
-
The Gardener
The gardener differs from the powerhouse in that they are focused on adding the most number of covered lines as opposed to the number of pull requests. They are likely an engineer that is writing larger features or infrastructure but is ensuring that it is being tested.
-
The Guardian
The guardian is doing the non-glamorous job of *increasing the total coverage of a codebase*. Unlike the other champions, they aren’t necessarily adding in a lot of code. Instead, they are likely adding in the most number of tests.
-
The Surgeon
The surgeon isn’t worried about adding lines of code. They are the developer that removes the most uncovered lines of code. Having a surgeon on your team ensures that your codebase stays clean.
-
The Sentry
Sentries are focused on ensuring that their code is of the highest quality. As such, they achieve the highest average patch coverage. This means that their changes are consistently being covered by tests.
Having and recognizing these champions on your team will greatly improve the testing culture you have in your team. In the next sections, we’ll show you how to identify these developers.
Retrieving pull request information from Codecov
If you haven’t already, take a look at this article to get started with the Codecov API. In this section, we’ll be focused on two endpoints
In order to retrieve the necessary information from Codecov, we will need to find all of the merged pull requests made in a certain time period, and then retrieve the comparison information for each PR.
Getting all merged pull requests
To pull all merged PRs from Codecov in the past 30 days, you can run
from datetime import datetime, timedelta
import json
import requests
CODECOV_ENDPOINT = "https://codecov.io/api/v2/{}/{}/{}"
TOKEN_NAME = CODECOV_API_TOKEN
CODECOV_HEADERS = {
'Authorization': 'bearer {}'.format(TOKEN_NAME)
}
DATE_FORMAT_S = '%Y-%m-%dT%H:%M:%SZ'
DATE_FORMAT_MS = '%Y-%m-%dT%H:%M:%S.%fZ'
def _format_date(datestring):
for fmt in (DATE_FORMAT_S, DATE_FORMAT_MS):
try:
return datetime.strptime(datestring, fmt)
except ValueError:
pass
raise ValueError('could not parse date')
def get_pulls(service, owner, repo):
page = 1
pulls, has_next = _get_pulls(service, owner, repo, page)
while has_next is not None:
print('Retrieved page {} of PRs'.format(page))
page += 1
next_pulls, has_next = _get_pulls(service, owner, repo, page)
pulls.extend(next_pulls)
now = datetime.now()
pulls = [pull for pull in pulls if (
pull['updatestamp'] is not None and _format_date(pull['updatestamp']) >= now - timedelta(days=30) and pull['head_totals'] is not None and pull['base_totals'] is not None
)]
print('Retrieved {} merged PRs'.format(len(pulls)), end='\n\n')
return pulls
def _get_pulls(service, owner, repo, page, batch=100):
endpoint = CODECOV_ENDPOINT.format(
service,
owner,
"repos/{}/pulls?state=merged&&page_size={}&&page={}&&ordering=updatestamp".format(
repo,
batch,
page,
),
)
response = requests.get(
endpoint,
headers=CODECOV_HEADERS,
)
content = json.loads(response.content)
return content['results'], content['next']
The important function here is get_pulls(service, owner, repo)
which will return all necessary information in a list. Be sure to replace CODECOV_API_TOKEN
with your appropriate API token.
You can run the above code by calling get_pulls
and the appropriate arguments like so,
get_pulls("github", "codecov", "uploader")
Getting coverage information on each pull request
Now with a list of pull requests, we can call the Codecov API on each pull to see what the coverage totals are. First, we need to update the get_pulls
function to iterate through each PR.
def get_pulls(service, owner, repo):
…
# print('Retrieved {} merged PRs'.format(len(pulls)), end='\n\n')
for i, pull in enumerate(pulls):
print('Pulling patch data: {} of {}'.format(i+1, len(pulls)))
pulls[i]['patch'] = _get_patch_from_pullid(service, owner, repo, pull['pullid'])
# return pulls
…
Then, we need to add the _get_patch_from_pullid
function to acquire the coverage totals
def _get_patch_from_pullid(service, owner, repo, pullid):
endpoint = CODECOV_ENDPOINT.format(
service,
owner,
"repos/{}/compare?pullid={}".format(
repo,
pullid,
),
)
response = requests.get(
endpoint,
headers=CODECOV_HEADERS,
)
content = json.loads(response.content)
return content.get('totals', {})
At this point, we should have all the necessary information to uncover your testing champions.
Using the Codecov data to identify champions
For the following sections, we will provide functions to find the top developers of each champion type. You can change the number of authors returned back by adjusting the top
argument. The pulls
argument is supplied from the get_pulls
function described above.
The function below can be used to print out the metrics and leaders for each of the following champions.
def get_leaderboards(pulls, fn, top=10):
metric, leaders = fn(pulls, top)
print(metric)
for leader in leaders:
print(','.join([str(i) for i in leader]))
The Powerhouse
def _powerhouses(pulls, top):
author_freq = {}
for pull in pulls:
if pull['head_totals']['coverage'] > pull['base_totals']['coverage']:
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += 1
return 'Most PRs with increasing coverage by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
The Gardener
def _gardeners(pulls, top):
author_freq = {}
for pull in pulls:
hits = pull['patch']['head']['hits'] - pull['patch']['base']['hits']
if hits > 0:
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += hits
return 'Most added covered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
The Guardian
def _guardians(pulls, top):
author_freq = {}
for pull in pulls:
coverage = pull['patch']['head']['coverage'] - pull['patch']['base']['coverage']
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += coverage
sorted_freq = sorted(([round(v, 5),k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
return 'Most increase to coverage', [['{}%'.format(freq[0]), freq[1]] for freq in sorted_freq]
The Surgeon
def _surgeons(pulls, top):
author_freq = {}
for pull in pulls:
removed = pull['patch']['base']['misses'] - pull['patch']['head']['misses']
if removed > 0:
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += removed
return 'Most removed uncovered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
The Sentry
def _sentries(pull, top):
author_freq = {}
for pull in pulls:
if pull['patch']['patch'] is None:
continue
patch_coverage = pull['patch']['patch']['coverage']
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = []
author_freq[name].append(patch_coverage)
sorted_freq = sorted(([
round(sum(v)/len(v), 2),
'{} PRs'.format(len(v)),
k,
] for k,v in author_freq.items() if sum(v) > 0), reverse=True)[:top]
return 'Highest average patch coverage by author', [['{}%'.format(freq[0]), freq[1], freq[2]] for freq in sorted_freq]
Creating a coverage leaderboard
Before we provide the entire script, let’s see what we might expect looking at a repository that uses Codecov.
github/scrapy/scrapy
Most PRs with increasing coverage by author
4,wRAR
Most added covered lines by author
171,gliptak
Most increase to coverage
0.52%,wRAR
Most removed uncovered lines by author
57,wRAR
Highest average patch coverage by author
100.0%,1 PRs,gabrielztk
100.0%,1 PRs,elacuesta
100.0%,1 PRs,dschaller
If this is the type of data that you are looking for, you can use the below script. Remember to replace the CODECOV_API_TOKEN
.
from datetime import datetime, timedelta
import json
import requests
import sys
CODECOV_ENDPOINT = "https://codecov.io/api/v2/{}/{}/{}"
TOKEN_NAME = CODECOV_API_TOKEN
CODECOV_HEADERS = {
'Authorization': 'bearer {}'.format(TOKEN_NAME)
}
DATE_FORMAT_S = '%Y-%m-%dT%H:%M:%SZ'
DATE_FORMAT_MS = '%Y-%m-%dT%H:%M:%S.%fZ'
def _format_date(datestring):
for fmt in (DATE_FORMAT_S, DATE_FORMAT_MS):
try:
return datetime.strptime(datestring, fmt)
except ValueError:
pass
raise ValueError('could not parse date')
def get_pulls(service, owner, repo):
page = 1
pulls, has_next = _get_pulls(service, owner, repo, page)
while has_next is not None:
print('Retrieved page {} of PRs'.format(page))
page += 1
next_pulls, has_next = _get_pulls(service, owner, repo, page)
pulls.extend(next_pulls)
now = datetime.now()
pulls = [pull for pull in pulls if (
pull['updatestamp'] is not None and _format_date(pull['updatestamp']) >= now - timedelta(days=30) and pull['head_totals'] is not None and pull['base_totals'] is not None
)]
print('Retrieved {} merged PRs'.format(len(pulls)), end='\n\n')
for i, pull in enumerate(pulls):
print('Pulling patch data: {} of {}'.format(i+1, len(pulls)))
pulls[i]['patch'] = _get_patch_from_pullid(service, owner, repo, pull['pullid'])
return pulls
def _get_pulls(service, owner, repo, page, batch=100):
endpoint = CODECOV_ENDPOINT.format(
service,
owner,
"repos/{}/pulls?state=merged&&page_size={}&&page={}&&ordering=updatestamp".format(
repo,
batch,
page,
),
)
response = requests.get(
endpoint,
headers=CODECOV_HEADERS,
)
content = json.loads(response.content)
return content['results'], content['next']
def _get_patch_from_pullid(service, owner, repo, pullid):
endpoint = CODECOV_ENDPOINT.format(
service,
owner,
"repos/{}/compare?pullid={}".format(
repo,
pullid,
),
)
response = requests.get(
endpoint,
headers=CODECOV_HEADERS,
)
content = json.loads(response.content)
return content.get('totals', {})
def get_leaderboards(pulls, fns, top=10):
for fn in fns:
metric, leaders = fn(pulls, top)
print()
print(metric)
for leader in leaders:
print(','.join([str(i) for i in leader]))
def _powerhouses(pulls, top):
author_freq = {}
for pull in pulls:
if pull['head_totals']['coverage'] > pull['base_totals']['coverage']:
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += 1
return 'Most PRs with increasing coverage by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
def _gardeners(pulls, top):
author_freq = {}
for pull in pulls:
hits = pull['patch']['head']['hits'] - pull['patch']['base']['hits']
if hits > 0:
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += hits
return 'Most added covered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
def _guardians(pulls, top):
author_freq = {}
for pull in pulls:
coverage = pull['patch']['head']['coverage'] - pull['patch']['base']['coverage']
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += coverage
sorted_freq = sorted(([round(v, 5),k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
return 'Most increase to coverage', [['{}%'.format(freq[0]), freq[1]] for freq in sorted_freq]
def _surgeons(pulls, top):
author_freq = {}
for pull in pulls:
removed = pull['patch']['base']['misses'] - pull['patch']['head']['misses']
if removed > 0:
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = 0
author_freq[name] += removed
return 'Most removed uncovered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
def _sentries(pull, top):
author_freq = {}
for pull in pulls:
if pull['patch']['patch'] is None:
continue
patch_coverage = pull['patch']['patch']['coverage']
name = pull['author']['username']
if name not in author_freq:
author_freq[name] = []
author_freq[name].append(patch_coverage)
sorted_freq = sorted(([
round(sum(v)/len(v), 2),
'{} PRs'.format(len(v)),
k,
] for k,v in author_freq.items() if sum(v) > 0), reverse=True)[:top]
return 'Highest average patch coverage by author', [['{}%'.format(freq[0]), freq[1], freq[2]] for freq in sorted_freq]
if __name__=="__main__":
service = sys.argv[1]
owner = sys.argv[2]
repo = sys.argv[3]
pulls = get_pulls(service, owner, repo)
get_leaderboards(pulls, [
_powerhouses,
_gardeners,
_guardians,
_surgeons,
_sentries,
], top=5)
You can save the above script as pull_coverage_leaders.py
and run
python pull_coverage_leaders.py {{ servicename }} {{ owner }} {{ repo }}
# as an example
# python pull_coverage_leaders.py github codecov uploader
Publishing your coverage leaderboard
We are always looking to help our users and improve our API. If you use the code above to produce a leaderboard or have any questions or comments, please send us an email at devrel@codecov.io or reach out to us on Twitter.