3 Traits of Good Test Suites

As ev­i­denced by my pre­vi­ous posts on LLM-Assisted Fuzzing, I’ve been ded­i­cat­ing a lot of my men­tal band­width to main­tain­ing a low false-pos­i­tive rate while we im­prove Harper’s rule cov­er­age. Part of that is through fuzzing and dog­food­ing, some can be through sta­tis­tics, but the first lines of de­fense will con­tinue to be unit and in­te­gra­tion test­ing. This past week par­tic­u­larly, I’ve been read­ing up on how other big lint­ing pro­grams ap­proach this prob­lem.

1. Test Features, Not Code

I of­ten ask my­self: am I spend­ing more time think­ing or talk­ing about the thing, or am I spend­ing more time do­ing the thing? I’ve per­son­ally seen how pro­jects fall into de­cline be­cause their lead­ers are more in­ter­ested in plan­ning than do­ing.

In the con­text of soft­ware test­ing, this mantra is trans­formed into test fea­tures, not code.” To my eye, good code is flex­i­ble and self-ex­plana­tory. Tests that hook deeply into ap­pli­ca­tion or li­brary in­ter­nals make code less flex­i­ble and harder to read.

I es­pe­cially like Alex Kladov’s heuris­tic for this: the neural net­work test.

Can you re-use the test suite if your en­tire soft­ware is re­placed with an opaque neural net­work?” - Alex Kladov

It’s not a ques­tion of whether a neural net­work would pass the test suite, only whether the test suite could work for it. If the an­swer is no, the tests are likely test­ing code, not fea­tures.

2. Performance

The speed at which you can build and run tests (unit, sta­tic, in­te­gra­tion, etc.) is a force-mul­ti­plier for every­thing else. You can val­i­date ideas sooner, run CI faster, and get con­trib­u­tors on-boarded in less time.

Our goal to be fast at run­time dove­tails quite nicely into this, so it’s some­thing Harper al­ready does quite well. Moving for­ward, we need to make sure that we don’t rely on any kind of IO in our tests, since that con­tin­ues to be the slow­est part of most Harper in­te­gra­tions.

3. Good Assertions

We can sim­plify pro­grams like Harper down into a sin­gle func­tion which con­sumes text and re­turns a list of ob­served prob­lems.

fn harper(text: String) -> Vec<Lint>{
    // Implementation details...
}

Most test­ing we are in­ter­ested can be done with as­ser­tion func­tions that de­clare what qual­i­ties the out­put should have with a spe­cific in­put.

For ex­am­ple, we have a func­tion called as­sert_­sug­ges­tion_re­sult, which runs a gram­mat­i­cally in­cor­rect string through Harper, ap­plies the first re­sult­ing sug­ges­tion and checks whether the edited string matches a given value.

/// An example of a test that uses assert_suggestion_result
#[test]
fn catches_less_then() {
    assert_suggestion_result(
        "I eat less then you.",
        ThenThan::default(),
        "I eat less than you.",
    );
}

It’s also vi­tal that these as­ser­tions show good, read­able er­ror mes­sages when they fail. Each time I’ve im­proved their logs, I get un­prompted pos­i­tive feed­back from con­trib­u­tors.

Moving for­ward, I’d like to cre­ate a more di­verse ar­ray as­ser­tions like this, as well as bet­ter-doc­u­ment their use. A lot of the cur­rent back-and-forth for rule con­tri­bu­tions is re­lated this.

Wait! I Disagree

I hope some­one does. Good test suites are some­thing I’m con­tin­u­ing to learn how to build. I un­der­stand that a lot of what I’ve said here does­n’t ap­ply to other kinds of ap­pli­ca­tions or code­bases. If there’s nu­ance I’m not cov­er­ing here, let me know!