Blog Post

Learning a Codebase Using Tests

January 24, 2023 Tom Hu

A racetrack with a pink car taking a shortcut to win the lap

Large codebases are complex and can have multiple interacting components with dozens to hundreds of developers having a hand in the product. Joining a team with such a codebase can be daunting. And though it’s understood that developers need time to fully understand the code, it doesn’t make learning the codebase any less stressful.

That said, how can we make learning a codebase faster? One often overlooked part of the codebase is the tests. In this article, we’ll explore the benefits of reading tests as internal documentation to help pick up a new project.

The Challenges of Learning a New Codebase

Let’s come to an understanding of why learning a codebase is hard in the first place. This will help us understand why using tests can help overcome these issues. The challenges can be bucketed into these reasons:

  • Outdated documentation
  • Unfamiliar patterns, frameworks, or languages
  • Poor code readability
  • Product/industry inexperience
  • Quantity and shape of code
  • Old/deprecated code paths

Outdated Documentation
One of the first places to look for information about a codebase is the documentation and README file. However, these pieces may not always be up-to-date with the latest product changes or they may rely on an older tech stack. As a result, relying on documentation solely could be confusing when investigating the codebase afterward.

Unfamiliar Patterns, Frameworks, or Languages
Developers nowadays are not necessarily hired for their expertise in a specific framework or language, but their ability to code and migrate their existing skillsets. However, there is still a learning curve when being introduced to a new tech stack which takes time to internalize.

Poor Code Readability
When a codebase suffers from poor development practices, new developers struggle to get onboarded quickly. This can result from code that has poor design patterns or low readability. Valuable time is wasted when a function’s use cannot be quickly ascertained.

Product/Industry Inexperience
A developer is hired for many reasons, but having a deep working knowledge of the industry is not often at the top of the list. As a result, a new codebase with industry acronyms jargon can take some time to ingest, and model names need to be understood.

Quantity and Shape of Code
A large codebase with hundreds of subdirectories and files can intimidate new hires. Unless the directory structure is similar enough to previous work, the developer will need someone or some time to walk through it and understand where pieces of code are kept.

Old or Deprecated Code Paths
Similar to outdated documentation, old and deprecated code paths can also cause headaches for new developers without the proper context. Taking the time to understand a microservice that is no longer used will waste a developer’s time.

Why Tests are a Good First Place to Look

Any steps we take to make a codebase quicker to learn should improve a subset of the six reasons described above. In this section, we’ll see how reading and understanding tests achieve this goal.

Outdated Documentation
Tests that are run in a CI environment represent actual code paths that are run. Thus, if a feature is removed, the tests either do not exist or do not run. One way to help alleviate outdated documentation is to rely on integration and end-to-end tests when considering what features still exist.

Unfamiliar Patterns, Frameworks, or Languages
One of the struggles of learning a new framework/language is that it can be difficult to know how models or functions are used. Since tests are used to run pieces of the codebase in an environment, they serve as examples of how to create objects and how they are used. They can also help to identify common patterns based on the import structure.

Poor Code Readability
Although not a direct reason, having a testing culture is a good signal that code quality matters in a particular organization. But having compact functions greatly helps to write simple unit tests. These tests can help to illuminate the inputs and outputs of a particular method and how it is used in the system.

Picking up a New Codebase with Tests

So if we’re sure that tests are a good way to learn a new codebase, how do we go about doing this in practice? Since there’s no clear-cut moment when a codebase is learned, it can be ambiguous how long to spend reading the tests.

However, answering these questions can help determine when you have gotten a good understanding of the codebase from the tests.

  • Do you know what models are used for a user task?
  • Do you know what inputs and outputs are expected from a function?
  • Do you know if there are dependencies between different services?
  • If there’s a frontend, do you know how data is passed between the backend and frontend?

The crux of these questions is knowing how data is passed around. By using tests to identify these answers, we can help decrease the learning curve.

The 4 Steps of Learning a Codebase

In the previous sections, we discussed how using tests can help you learn a codebase faster. These are the typical steps I take when presented with a new codebase.

  1. Read the docs
  2. Overview the product
    1. Get a local development setup running
    2. Play with the product
    3. Identify main user stories
  3. Read the code
    1. Review the code structure
    2. Locate corresponding integration/end-to-end(E2E) tests
    3. Isolate models/factories used in testing
    4. Read the models and their public functions
  4. Make changes
    1. Make a change locally and review the impact
    2. Read and play with the unfamiliar frameworks
    3. Fix a bug and deploy the code change

You’ll notice that although reading the tests is an important part of learning a codebase, it’s only a piece of the puzzle. If you were to remove the bolded steps, you could still learn the codebase. But you would lose out on speeding up the learning process.

Do you pick up code using a different method? Let us know at devrel@codecov.io or on Twitter.

Before we redirect you to GitHub...
In order to use Codecov an admin must approve your org.