Blog Post

Uncover Your Testing Champions with a Coverage Leaderboard

November 14, 2022 Tom Hu

Writing tests is one of the most important activities to improve code quality. But it’s not that glamorous. It’s easier to see when things are going terribly wrong than to put in the effort to prevent those things from happening in the first place. As a result, developers don’t get the instant gratification that comes with writing tests.

But if you are trying to build a testing culture in your organization, you should identify and recognize good testers. In this article, we’ll map out a few of the types of testing champions in your organization. We’ll then show you how to pull that data using the Codecov API. Finally, we’ll present a script that you can use on your repositories to see who has been leading the charge to better coding practices.

All code snippets provided in this article are subject to the MIT license.

 

Types of testing champions

Let’s talk about the five kinds of testing champions you might have in your organization. These are the people who are dedicated to ensuring good testing practices. If you have a smaller team or company, there might be someone who encompasses multiple types of testing champion.

  • The Powerhouse

    The powerhouse is the engineer in your organization that is writing the most PRs that also increase coverage. They are responsible for a lot of the code written in the codebase and are disciplined enough to test that code. They are likely finding and fixing many of the bugs that pop up in production.

  • The Gardener

    The gardener differs from the powerhouse in that they are focused on adding the most number of covered lines as opposed to the number of pull requests. They are likely an engineer that is writing larger features or infrastructure but is ensuring that it is being tested.

  • The Guardian

    The guardian is doing the non-glamorous job of *increasing the total coverage of a codebase*. Unlike the other champions, they aren’t necessarily adding in a lot of code. Instead, they are likely adding in the most number of tests.

  • The Surgeon

    The surgeon isn’t worried about adding lines of code. They are the developer that removes the most uncovered lines of code. Having a surgeon on your team ensures that your codebase stays clean.

  • The Sentry

    Sentries are focused on ensuring that their code is of the highest quality. As such, they achieve the highest average patch coverage. This means that their changes are consistently being covered by tests.

Having and recognizing these champions on your team will greatly improve the testing culture you have in your team. In the next sections, we’ll show you how to identify these developers.

 

Retrieving pull request information from Codecov

If you haven’t already, take a look at this article to get started with the Codecov API. In this section, we’ll be focused on two endpoints

  1. Pulls list
  2. Commit comparison

In order to retrieve the necessary information from Codecov, we will need to find all of the merged pull requests made in a certain time period, and then retrieve the comparison information for each PR.

Getting all merged pull requests

To pull all merged PRs from Codecov in the past 30 days, you can run

from datetime import datetime, timedelta
import json
import requests

CODECOV_ENDPOINT = "https://codecov.io/api/v2/{}/{}/{}"
TOKEN_NAME = CODECOV_API_TOKEN
CODECOV_HEADERS = {
    'Authorization': 'bearer {}'.format(TOKEN_NAME)
}
DATE_FORMAT_S = '%Y-%m-%dT%H:%M:%SZ'
DATE_FORMAT_MS = '%Y-%m-%dT%H:%M:%S.%fZ'

def _format_date(datestring):
    for fmt in (DATE_FORMAT_S, DATE_FORMAT_MS):
        try:
            return datetime.strptime(datestring, fmt)
        except ValueError:
            pass
    raise ValueError('could not parse date')

def get_pulls(service, owner, repo):
    page = 1
    pulls, has_next = _get_pulls(service, owner, repo, page)
    while has_next is not None:
        print('Retrieved page {} of PRs'.format(page))
        page += 1
        next_pulls, has_next = _get_pulls(service, owner, repo, page)
        pulls.extend(next_pulls)

    now = datetime.now()
    pulls = [pull for pull in pulls if (
        pull['updatestamp'] is not None and _format_date(pull['updatestamp']) >= now - timedelta(days=30) and pull['head_totals'] is not None and pull['base_totals'] is not None
    )]

    print('Retrieved {} merged PRs'.format(len(pulls)), end='\n\n')

    return pulls

def _get_pulls(service, owner, repo, page, batch=100):
    endpoint = CODECOV_ENDPOINT.format(
        service,
        owner,
        "repos/{}/pulls?state=merged&&page_size={}&&page={}&&ordering=updatestamp".format(
            repo,
            batch,
            page,
        ),
    )
    response = requests.get(
        endpoint,
        headers=CODECOV_HEADERS,
    )
    content = json.loads(response.content)
    return content['results'], content['next']

The important function here is get_pulls(service, owner, repo) which will return all necessary information in a list. Be sure to replace CODECOV_API_TOKEN with your appropriate API token.

You can run the above code by calling get_pulls and the appropriate arguments like so,

get_pulls("github", "codecov", "uploader")

Getting coverage information on each pull request

Now with a list of pull requests, we can call the Codecov API on each pull to see what the coverage totals are. First, we need to update the get_pulls function to iterate through each PR.

def get_pulls(service, owner, repo):
    …
    # print('Retrieved {} merged PRs'.format(len(pulls)), end='\n\n')
    for i, pull in enumerate(pulls):
        print('Pulling patch data: {} of {}'.format(i+1, len(pulls)))
        pulls[i]['patch'] = _get_patch_from_pullid(service, owner, repo, pull['pullid'])
    # return pulls
    …

Then, we need to add the _get_patch_from_pullid function to acquire the coverage totals

def _get_patch_from_pullid(service, owner, repo, pullid):
    endpoint = CODECOV_ENDPOINT.format(
        service,
        owner,
        "repos/{}/compare?pullid={}".format(
            repo,
            pullid,
        ),
    )
    response = requests.get(
        endpoint,
        headers=CODECOV_HEADERS,
    )
    content = json.loads(response.content)
    return content.get('totals', {})

At this point, we should have all the necessary information to uncover your testing champions.

 

Using the Codecov data to identify champions

For the following sections, we will provide functions to find the top developers of each champion type. You can change the number of authors returned back by adjusting the top argument. The pulls argument is supplied from the get_pulls function described above.

The function below can be used to print out the metrics and leaders for each of the following champions.

def get_leaderboards(pulls, fn, top=10):
    metric, leaders = fn(pulls, top)
    print(metric)
    for leader in leaders:
        print(','.join([str(i) for i in leader]))

The Powerhouse

def _powerhouses(pulls, top):
    author_freq = {}
    for pull in pulls:
        if pull['head_totals']['coverage'] > pull['base_totals']['coverage']:
            name = pull['author']['username']
            if name not in author_freq:
                author_freq[name] = 0
            author_freq[name] += 1
    return 'Most PRs with increasing coverage by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]

The Gardener

def _gardeners(pulls, top):
    author_freq = {}
    for pull in pulls:
        hits = pull['patch']['head']['hits'] - pull['patch']['base']['hits']
        if hits > 0:
            name = pull['author']['username']
            if name not in author_freq:
                author_freq[name] = 0
            author_freq[name] += hits
    return 'Most added covered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]

The Guardian

def _guardians(pulls, top):
    author_freq = {}
    for pull in pulls:
        coverage = pull['patch']['head']['coverage'] - pull['patch']['base']['coverage']
        name = pull['author']['username']
        if name not in author_freq:
            author_freq[name] = 0
        author_freq[name] += coverage
    sorted_freq = sorted(([round(v, 5),k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
    return 'Most increase to coverage', [['{}%'.format(freq[0]), freq[1]] for freq in sorted_freq]

The Surgeon

def _surgeons(pulls, top):
    author_freq = {}
    for pull in pulls:
        removed = pull['patch']['base']['misses'] - pull['patch']['head']['misses']
        if removed > 0:
            name = pull['author']['username']
            if name not in author_freq:
                author_freq[name] = 0
            author_freq[name] += removed
    return 'Most removed uncovered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]

The Sentry

def _sentries(pull, top):
    author_freq = {}
    for pull in pulls:
        if pull['patch']['patch'] is None:
            continue
        patch_coverage = pull['patch']['patch']['coverage']
        name = pull['author']['username']
        if name not in author_freq:
            author_freq[name] = []
        author_freq[name].append(patch_coverage)
    sorted_freq = sorted(([
        round(sum(v)/len(v), 2),
        '{} PRs'.format(len(v)),
        k,
    ] for k,v in author_freq.items() if sum(v) > 0), reverse=True)[:top]
    return 'Highest average patch coverage by author', [['{}%'.format(freq[0]), freq[1], freq[2]] for freq in sorted_freq]

 

Creating a coverage leaderboard

Before we provide the entire script, let’s see what we might expect looking at a repository that uses Codecov.

github/scrapy/scrapy
Most PRs with increasing coverage by author
4,wRAR

Most added covered lines by author
171,gliptak

Most increase to coverage
0.52%,wRAR

Most removed uncovered lines by author
57,wRAR

Highest average patch coverage by author
100.0%,1 PRs,gabrielztk
100.0%,1 PRs,elacuesta
100.0%,1 PRs,dschaller

If this is the type of data that you are looking for, you can use the below script. Remember to replace the CODECOV_API_TOKEN.

from datetime import datetime, timedelta
import json
import requests
import sys

CODECOV_ENDPOINT = "https://codecov.io/api/v2/{}/{}/{}"
TOKEN_NAME = CODECOV_API_TOKEN
CODECOV_HEADERS = {
    'Authorization': 'bearer {}'.format(TOKEN_NAME)
}
DATE_FORMAT_S = '%Y-%m-%dT%H:%M:%SZ'
DATE_FORMAT_MS = '%Y-%m-%dT%H:%M:%S.%fZ'

def _format_date(datestring):
    for fmt in (DATE_FORMAT_S, DATE_FORMAT_MS):
        try:
            return datetime.strptime(datestring, fmt)
        except ValueError:
            pass
    raise ValueError('could not parse date')

def get_pulls(service, owner, repo):
    page = 1
    pulls, has_next = _get_pulls(service, owner, repo, page)
    while has_next is not None:
        print('Retrieved page {} of PRs'.format(page))
        page += 1
        next_pulls, has_next = _get_pulls(service, owner, repo, page)
        pulls.extend(next_pulls)

    now = datetime.now()
    pulls = [pull for pull in pulls if (
        pull['updatestamp'] is not None and _format_date(pull['updatestamp']) >= now - timedelta(days=30) and pull['head_totals'] is not None and pull['base_totals'] is not None
    )]

    print('Retrieved {} merged PRs'.format(len(pulls)), end='\n\n')
    for i, pull in enumerate(pulls):
        print('Pulling patch data: {} of {}'.format(i+1, len(pulls)))
        pulls[i]['patch'] = _get_patch_from_pullid(service, owner, repo, pull['pullid'])

    return pulls

def _get_pulls(service, owner, repo, page, batch=100):
    endpoint = CODECOV_ENDPOINT.format(
        service,
        owner,
        "repos/{}/pulls?state=merged&&page_size={}&&page={}&&ordering=updatestamp".format(
            repo,
            batch,
            page,
        ),
    )
    response = requests.get(
        endpoint,
        headers=CODECOV_HEADERS,
    )
    content = json.loads(response.content)
    return content['results'], content['next']

def _get_patch_from_pullid(service, owner, repo, pullid):
    endpoint = CODECOV_ENDPOINT.format(
        service,
        owner,
        "repos/{}/compare?pullid={}".format(
            repo,
            pullid,
        ),
    )
    response = requests.get(
        endpoint,
        headers=CODECOV_HEADERS,
    )
    content = json.loads(response.content)
    return content.get('totals', {})

def get_leaderboards(pulls, fns, top=10):
    for fn in fns:
        metric, leaders = fn(pulls, top)
        print()
        print(metric)
        for leader in leaders:
            print(','.join([str(i) for i in leader]))

def _powerhouses(pulls, top):
    author_freq = {}
    for pull in pulls:
        if pull['head_totals']['coverage'] > pull['base_totals']['coverage']:
            name = pull['author']['username']
            if name not in author_freq:
                author_freq[name] = 0
            author_freq[name] += 1
    return 'Most PRs with increasing coverage by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]

def _gardeners(pulls, top):
    author_freq = {}
    for pull in pulls:
        hits = pull['patch']['head']['hits'] - pull['patch']['base']['hits']
        if hits > 0:
            name = pull['author']['username']
            if name not in author_freq:
                author_freq[name] = 0
            author_freq[name] += hits
    return 'Most added covered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]

def _guardians(pulls, top):
    author_freq = {}
    for pull in pulls:
        coverage = pull['patch']['head']['coverage'] - pull['patch']['base']['coverage']
        name = pull['author']['username']
        if name not in author_freq:
            author_freq[name] = 0
        author_freq[name] += coverage
    sorted_freq = sorted(([round(v, 5),k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]
    return 'Most increase to coverage', [['{}%'.format(freq[0]), freq[1]] for freq in sorted_freq]

def _surgeons(pulls, top):
    author_freq = {}
    for pull in pulls:
        removed = pull['patch']['base']['misses'] - pull['patch']['head']['misses']
        if removed > 0:
            name = pull['author']['username']
            if name not in author_freq:
                author_freq[name] = 0
            author_freq[name] += removed
    return 'Most removed uncovered lines by author', sorted(([v,k] for k,v in author_freq.items() if v > 0), reverse=True)[:top]

def _sentries(pull, top):
    author_freq = {}
    for pull in pulls:
        if pull['patch']['patch'] is None:
            continue
        patch_coverage = pull['patch']['patch']['coverage']
        name = pull['author']['username']
        if name not in author_freq:
            author_freq[name] = []
        author_freq[name].append(patch_coverage)
    sorted_freq = sorted(([
        round(sum(v)/len(v), 2),
        '{} PRs'.format(len(v)),
        k,
    ] for k,v in author_freq.items() if sum(v) > 0), reverse=True)[:top]
    return 'Highest average patch coverage by author', [['{}%'.format(freq[0]), freq[1], freq[2]] for freq in sorted_freq]

if __name__=="__main__":
    service = sys.argv[1]
    owner = sys.argv[2]
    repo = sys.argv[3]
    pulls = get_pulls(service, owner, repo)
    get_leaderboards(pulls, [
        _powerhouses,
        _gardeners,
        _guardians,
        _surgeons,
        _sentries,
    ], top=5)

You can save the above script as pull_coverage_leaders.py and run

python pull_coverage_leaders.py {{ servicename }} {{ owner }} {{ repo }}
# as an example
# python pull_coverage_leaders.py github codecov uploader

 

Publishing your coverage leaderboard

We are always looking to help our users and improve our API. If you use the code above to produce a leaderboard or have any questions or comments, please send us an email at devrel@codecov.io or reach out to us on Twitter.

Before we redirect you to GitHub...
In order to use Codecov an admin must approve your org.