Although a developer can go their whole career without writing bash, bash scripting is one of the most useful skills an engineer can learn. It is the preferred way to interact with your UNIX environment and helps with system administration. More importantly, it can be used for a significant amount of automation in your systems.
In a recent project (detailed at the end), I worked with bash
to help solve one of the thornier problems in our system. Here are the best tips and tricks that I learned along the way.
1. Printing to stdout
and saving to variable from stderr
Solution
exec 3>&1
VAR=$(<command> 2>&1 | tee >(cat - >&3) )
Replace <command>
and VAR
with applicable commands and variable names.
Example
--> exec 3>&1
--> ERROR=$(python -c '1/0' 2>&1 | tee >(cat - >&3) )
Traceback (most recent call last):
File "", line 1, in
ZeroDivisionError: division by zero
--> echo $ERROR
Traceback (most recent call last):
File "", line 1, in
ZeroDivisionError: division by zero
Explanation
python
generally outputs to stderr
. We use 2>&1
in order to capture the output to stdout
.
tee
is a powerful command that takes in stdin
and outputs to stdout
and another location. This other location must be a file-like object, thus why cat - >&3
is surrounded by >(...)
. The above code will output the Traceback error, but also save it down to ERROR
.
2. Non-greedy sed
regex
Solution
sed 's/[^<char>]*<char>/'
Replace <char>
with the first instance of a character the regex should stop at.
Example
# Non-greedy
--> TESTS='{"A": ["test_new"], "B": [], "C": ["test_add", "test_divide"], "D": []}'
--> TESTS=$(echo $TESTS | sed 's/\\"C[^]]*\\]//')
-> echo $TESTS
{"A": ["test_new"], "B": [], , "D": []}
# Greedy
--> TESTS='{"A": ["test_new"], "B": [], "C": ["test_add", "test_divide"], "D": []}'
--> TESTS=$(echo $TESTS | sed 's/\\"C.*\\]//')
--> echo $TESTS
{"A": ["test_new"], "B": [], }
Explanation
Although not perfect, the [^<char>]*
mechanism will attempt to match as many characters that do not match the given <char>
. On the first match of <char>
, it will end the regex.
In the example, we are looking to remove "C": ["test_add", "test_divide"]
without removing the "D": []
block. The ]
character is first matched at the end of the C
values and the expression is returned without the C
section. In the greedy version, we see that sed
eliminates the entire section.
3. Print the length of an array
Solution
--> echo $#<array>[@]
Example
--> ARRAY=("one" "two" "three")
--> echo $ARRAY
one two three
--> echo $#ARRAY[@]
3
Explanation
The ARRAY
variable contains 3 elements, while the #ARRAY[@]
block treats each element of the array as a separate element. We will see later why this is important.
4. Transform a space-separated list into an array
Solution
--> IFS=" " #default
--> <array>=($(echo $<list>))
Example
--> IFS=" "
--> LIST="one two three"
--> ARRAY=($(echo $LIST))
--> echo $ARRAY
one two three
# Get the length of the array
--> echo $#ARRAY[@]
3
Explanation
The IFS
renaming is not always necessary, and it is recommended that you save the old value. However, we can wrap a space-separated string with ( )
in order to transform it into an array.
5. Remove duplicates from an array
Solution
<uniques>=($(echo "${<array>[@]}" | tr ' ' '\\n' | sort -u | tr '\\n' ' '))
Example
--> ARRAY=("one" "one" "two" "two" "two" "one" "three")
--> UNIQUES=($(echo "${ARRAY[@]}" | tr ' ' '\\n' | sort -u | tr '\\n' ' '))
--> echo $UNIQUES
one three two
Explanation
The important part of this command is sort -u
which will sort the items (now broken into individual lines) and remove any duplicates. As you can see from the example, we lose the original sorting order.
6. Join an array with “
Solution
--> echo "${<array>[@]}"
# OR
--> echo "${<array>[*]}"
Example
--> COMMANDS=("print('hello');" "print('world')")
--> python -c "${COMMANDS[@]}"
hello
thomashu@C02ZW1EDMD6Q [~]
--> python -c "${COMMANDS[*]}"
hello
world
Explanation
As we saw earlier, [@]
and [*]
behave differently with arrays. Depending on how you choose to use the items will depend on which one to use.
[@]
will use each item as its own double-quoted item. You can see in the example thatprint('world')
is not run as a result.[*]
will combine the items into a single double-quoted item. In this case, we havepython -c "print('hello'); print('world')"
. Thus, we see bothhello
andworld
.
7. [Bonus] Saving a variable in GitHub Actions to be used in a future step
Solution
echo "<env var>=<var to save>" >> $GITHUB_ENV
Example
- run: |
TESTS="testone testtwo"
echo "TESTLABELS=$TESTS" >> $GITHUB_ENV
- run: echo ${{ env.TESTLABELS }}
# testone testtwo
Explanation
Sometimes it’s valuable to use variables across GitHub Action steps within a job. In this case, we can save a variable to the GitHub Actions environment and use it in subsequent steps.
How did we use these tips at Codecov?
At Codecov, we’ve been working hard to add new features to our new CLI. One of these features, Automated Test Selection helps developers run a subset of tests in their test suite based on their code changes. The effect is to greatly decrease the amount of time and resources it takes to run your Continuous Integration/Continuous Deployment (CI/CD) platform.
To increase usage with how a developer might run tests, we investigated how to output the tests into the CI/CD platform. Those test labels could then be used in future steps before uploading coverage to Codecov.
Problem
The label-analysis
command for the Codecov CLI currently outputs
...
info - 2023-09-08 17:32:26,698 -- Not executing tests because '--dry-run' is on. List of labels selected for running below.
info - 2023-09-08 17:32:26,698 -- Label groups:
info - 2023-09-08 17:32:26,698 -- - absent_labels: Set of new labels found in HEAD that are not present in BASE
info - 2023-09-08 17:32:26,698 -- - present_diff_labels: Set of labels affected by the git diff
info - 2023-09-08 17:32:26,698 -- - global_level_labels: Set of labels that possibly touch global code
info - 2023-09-08 17:32:26,698 -- - present_report_labels: Set of labels previously uploaded
info - 2023-09-08 17:32:26,698 -- {"absent_labels": ["app/test_calculator.py::test_new"], "global_level_labels": [], "present_diff_labels": [], "present_report_labels": ["app/test_calculator.py::test_add", "app/test_calculator.py::test_divide", "app/test_calculator.py::test_multiply", "app/test_calculator.py::test_subtract"]}
The final line in the output contains all of the test names that are available to run. However, a subset of those (absent_labels, present_diff_labels, and global_level_labels)
are the set of tests that should be run for a code change.
The task is to take that output and run pytest
against that subset of tests.
Solution
You can view the entire code used in the Label Analysis step.
Try it out yourself
If you want to learn how to reduce the number of tests you run in your CI/CD, look at Automated Test Selection or send us a message on our feedback repository.