Putting Harper in Your Browser

When our users in­stall Harper, they should ex­pect it to work any­where they do. Whether they’re writ­ing up a blog post in WordPress, leav­ing a com­ment on Facebook, or mes­sag­ing a loved one on WhatsApp, Harper should be there. Harper’s core is ex­tremely portable and it can run pretty much any­where Rust can, so what’s the big deal?

Why can’t we just run Harper in the browser through a web ex­ten­sion?

Running Locally

There’s a sin­gle com­pli­ant that I hear over and over again from peo­ple who use Grammarly or LanguageTool: they are both slow as mo­lasses. The process of writ­ing has slowly evolved to be more com­plex than it needs to be. With these tools, it writ­ing looks like:

  1. Write a sen­tence.
  2. Wait for the gram­mar checker to run (which takes as many as four sec­onds).
  3. Fix the mis­takes you made.
  4. Go back to step one.

The whole process re­minds me of the copi­lot pause. This is part of why Harper is bet­ter than these other tools: it does­n’t stop you from writ­ing at the speed of thought. Our most ar­dent users tell us this all the time: it feels great to just write, er­ror free.

How do we de­liver gram­mar check­ing so quickly?

Instead of host­ing huge Java code­bases in the cloud, we ship our soft­ware straight to the user’s de­vice. Since there’s no net­work re­quest in­volved, we’re able to put pix­els on the screen faster than any­one else. That’s not to men­tion the pri­vacy im­pli­ca­tions.

Running Harper’s en­gine lo­cally in the browser pre­sented some tech­ni­cal chal­lenges.

I’m quite proud that our JavaScript li­brary can be in­stalled as sim­ply as npm install harper.js. In or­der for that to work as well as it does, I needed to de­velop a sys­tem for:

  1. Compiling our en­gine to WebAssembly.
  2. Shipping that en­gine to the browser.
  3. Instantiating the WebAssembly code.
  4. Build out the boil­er­plate nec­es­sary to make it feel na­tive.

Steps one and four were easy. I just slapped #[wasm_bindgen] tags on a Rust li­brary and put on a pot of cof­fee. Steam­ing cof­fee is vi­tal for writ­ing te­dious JavaScript.

Steps two and three, how­ever, were a lit­tle more dif­fi­cult. The lat­est it­er­a­tion of Google’s ex­ten­sion stan­dard, Manifest V3, places some heavy-handed rules on how ex­e­cutable code could be loaded. I won’t bore you with the de­tails here. Know that I spent many hours in JavaScript bundler hell.

Running Everywhere

Harper, nascent as it is, has the great­est mar­ket op­por­tu­nity in the browser. Over 3.5 bil­lion peo­ple use Chrome on a weekly ba­sis. The plu­ral­ity of knowl­edge work­ers spend most of their wak­ing mo­ments (as crush­ing as it sounds) in a web browser. Half of the time spent at desks to­day is spent writ­ing.

In or­der to ad­dress this mar­ket seg­ment, we need a Chrome ex­ten­sion. To lint text in the browser, I need a way to:

  1. Cleanly read text from in­put fields.
  2. Locate the pixel co­or­di­nates of gram­mat­i­cal er­rors.
  3. Render sug­ges­tions in pop­ups.
  4. Cleanly re­place text in in­put fields when a sug­ges­tion is se­lected.
An example of Harper's suggestion box.

Reading and Writing Text is Hard

The web may have stan­dards, but there is noth­ing stan­dard about it. The standard” way to in­put text is with a <textarea /> el­e­ment. Even so, most high-traf­fic sites im­ple­ment their own text ed­i­tors from scratch, us­ing <div contenteditable="true" /> as a base. Each of these cases re­quired spe­cial care.

<textarea />

<textarea />s are hard for one rea­son: it is dif­fi­cult to get a good un­der­stand­ing of what they look like. I can ob­tain their con­tent with input.value, but I can’t di­rectly in­fer the pixel co­or­di­nates of gram­mat­i­cal er­rors in­side them.

When the Harper Chrome Extension is of­fered an <input /> or <textarea /> el­e­ment to an­a­lyze, here’s what it does.

  1. Creates a new <div />.
  2. Copies all styles from the pro­vided el­e­ment onto the <div />.
  3. Using position: absolute;, it moves this <div /> di­rectly on top of the pro­vided el­e­ment.
  4. Copies the con­tent of the pro­vided el­e­ment into the <div />.
  5. Uses the Range API to turn the text in­dices emit­ted by Harper’s en­gine into pixel co­or­di­nates on the <div />.

Using this mir­ror­ing strat­egy works, but has per­for­mance im­pli­ca­tions and ad­di­tional com­plex­ity to han­dle scrolling within the el­e­ment.

<div contenteditable="true" />

Since el­e­ments in a contenteditable text ed­i­tor (like Trix, Lexical, Gutenberg, etc.) ac­tu­ally ex­ist in the DOM, I can just use the Range API to get pixel co­or­di­nates. The trou­ble this time comes when I try to write a sug­ges­tion back into the ed­i­tor.

Most doc­u­men­ta­tion I could find sug­gests that you:

  1. Select the con­tent of the el­e­ment you wish to edit. This can be done us­ing window.getSelection().addRange().
  2. Call document.execCommand('insertText', null, "YOUR TEXT").

Much of this doc­u­men­ta­tion ac­knowl­edges that document.execCommand is dep­re­cated, but in­structs you to use it any­way.

This is bad ad­vice. Do not do this. I spent an em­bar­rass­ing amount of time try­ing to get it to work con­sis­tently.

The bet­ter way to re­place text pro­gram­mat­i­cally comes di­rectly from the W3C stan­dard:

  1. Manually edit the DOM in the fash­ion out­lined by the sug­ges­tion cho­sen by the user.
  2. Fire in­put events to in­struct WYSIWYG ed­i­tors to syn­chro­nize their in­ter­nal state to the DOM.

I’ll ad­mit that this is an over­sim­pli­fi­ca­tion. Much of the com­plex­ity here lays in de­ter­min­ing which DOM nodes to edit or fire events on.

Read the Source of Truth

If you take one thing away from this post, it should be this: al­ways read from the source of truth. There is a lot of faulty in­for­ma­tion out there, es­pe­cially when it comes to cre­at­ing com­plex in­ter­ac­tions with opaque sys­tems. If there is a source of truth, read it. It may look in­tim­i­dat­ing or seem un­nec­es­sar­ily ver­bose. It is that way for a rea­son.