<h1>The Simplest Neovim Markdown Setup</h1>
I am not one who enjoys complexity.
I am also someone who likes to make their <a href="https://github.com/elijah-potter/harper">own tools</a>.
As a student, I write a lot.
That includes notes, papers, and documentation for my wide variety of projects.
A lot of classes request to submit assignments as either PDF, HTML or <code>.docx</code>.
For a while, I submitted PDF's.
I had a whole orchestrated setup.
I would write everything in <a href="https://neovim.io/">Neovim</a>, save it, and <a href="https://github.com/frabjous/knap">KNAP</a> would render it using <a href="https://pandoc.org/">Pandoc</a>.
Finally, it would be rendered to my screen using <a href="https://sioyek.info/">Sioyek</a>.
This worked fine, I guess, but it was far from perfect.
<ol>
<li>It was slow. Each edit I made in Markdown could take as many as 10 seconds to show up in Sioyek.</li>
<li>It wasn't interesting. While I made modifications to my Pandoc settings, my PDF's still looked like every other <code>pdflatex</code> document ever made.</li>
<li>I couldn't make my documents interactive if I wanted to.</li>
</ol>
Markdown was designed to be turned into HTML, I reasoned.
So why not just do that?
<img src="/images/tatum_screenshot.webp" alt="A screenshot of Tatum at work">
That's why I created <a href="https://github.com/elijah-potter/tatum">Tatum</a>.
It does one thing, really well.
Point it at a Markdown file, and it will run a tiny web server to render the resulting HTML to.
If the file changes, a WebSocket connection tells the browser to refresh.
Tatum renders in milliseconds and creates beautiful pages with <a href="https://katex.org/"><math xmlns="http://www.w3.org/1998/Math/MathML"><semantics><mrow><mtext>KaTeX</mtext></mrow><annotation encoding="application/x-tex">\KaTeX</annotation></semantics></math>KATE​X</a>, <a href="https://simplecss.org/">Simple.css</a> and <a href="https://highlightjs.org/"><code>highlight.js</code></a>.
I can embed interactive HTML, CSS and JavaScript elements directly into my Markdown to get the interactivity and aesthetics I desire.
Once I'm done working, I just run <code>tatum render &#x3C;file.md></code> and I get a single file (images and all) that I can submit for my assignments.
Tatum isn't for you to use.
Feel free to poke around at how it works, or even fork it and make your own modifications.
It fits my use case perfectly.

How I preview my Markdown quickly and reliably.

How I preview my Markdown quickly and reliably.

The Simplest Neovim Markdown Setup

<h1>For The Love of <code>iframe</code>s.</h1>
I adore a good <code>iframe</code>.
They're so elegant as a web component.
Just expose an endpoint, say <a href="https://writewithharper.com/editor?initialText=This%20is%20an%20interactive%20buffer%20you%20can%20use%20to%20to%20check%20your%20work"><code>https://writewithharper.com/editor</code></a>, set it up to accept some query parameters and get it hosted.
Now you can access this web-component from any page that has a loose enough Content Security Policy.
For me, that means my <a href="https://elijahpotter.dev/articles/the_simplest_neovim_markdown_setup">school assignments</a> and other assorted documentation.
It also means that I can avoid setting up a complex build system for MDX, while still being able to include interactive components.
The example from earlier:
<pre><code class="hljs language-html">&#x3C;iframe
 src="https://writewithharper.com/editor?initialText=See, we can now embed the the Harper editor%0Arght into this document!%0A%0AIt's a little too easy."
 width="100%"
 height="400px"
 style="border-radius:10px;border:0px"
>&#x3C;/iframe>
</code></pre>
One major caveat though: when we pass our arguments to the component through the query URL, this gets sent to the component's server as well.
I certainly trust the Harper website's server, since I maintain it and the code is <a href="https://github.com/elijah-potter/harper/tree/master/packages/web">open source</a>, but that isn't always the case.
<iframe src="https://writewithharper.com/editor?initialText=See, we can now embed the the Harper editor%0Arght into this document!%0A%0AIt's a little too easy." width="100%" height="400px" style="border-radius:10px;border:0px" ></iframe>
You should also probably avoid doing this too much.
Most browsers spawn a whole new process for every <code>iframe</code>, so if you want things to stay snappy it is best to limit yourself to just one (maybe two) per page.

Chronically underrated, chronically over-prescribed

Chronically underrated, chronically over-prescribed

For the Love of Iframes

<h1>Adding a Programming Language to Harper</h1>
When I started the <a href="https://github.com/automattic/harper">Harper project</a> I knew I wanted to be able to use it for the comments in my code.
First, because I knew these comments would become part of our official documentation over time, and because I hoped it would encourage me to write more.
Over time, this has become one of the most prized features of the software, attracting tens of thousands of developers.
The common problem, however, is that there have always been programming languages that our LSP doesn't support.
One of the <a href="https://github.com/Automattic/harper/issues/79">oldest issues</a> on GitHub is about this.
This post is a guide for adding a new programming language to the Harper language server.
Why isn't it in the official documentation?
While the information contained within this guide will remain relevant to the project for a long time, I don't imagine each identifier or file path to remain the same.
If you think it would better serve potential contributors to place this guide on the <a href="https://writewithharper.com">main site</a>, let me know.
<h2>Introduction to Tree-sitter</h2>
<a href="https://tree-sitter.github.io/tree-sitter/">Tree-sitter</a> is fantastic framework for building fault-tolerant language parsers.
That means it is still able to parse the majority of a document, even if it contains portions of invalid syntax.
This is important for Harper, since we expect people to use Harper while their programming.
It should be OK if some of their code is incorrect, since we only care about their comments.
There are also a wide variety of Tree-sitter parsers available on <a href="https://crates.io/">crates.io</a>, ripe for our consumption.
If you want to add a language to Harper, this is the easiest way to do so.
<h2>Step 0: Avoid Duplicating Work</h2>
You're interested in adding support for a programming language.
If that's the case, it's possible other people are too.
Make sure no one else has <a href="https://elijahpotter.dev/articles/never_wait">opened a PR</a> or <a href="https://writewithharper.com/docs/integrations/language-server#Supported-Languages">has already merged support</a> for the language you have in mind.
<h2>Step 1: Find a Grammar</h2>
Look for an existing grammar on <a href="https://crates.io">crates.io</a>.
By convention, they tend to be named <code>tree-sitter-&#x3C;language></code>, where <code>&#x3C;language></code> is the language you're looking for. For example, <a href="https://crates.io/crates/tree-sitter-java"><code>tree-sitter-java</code></a> is for Java and <a href="https://crates.io/crates/tree-sitter-rust"><code>tree-sitter-rust</code></a> is for Rust.
If you would rather write your own grammar, make sure it is eventually published on <code>crates.io</code>.
<code>harper-ls</code> binaries are often consumed from <code>crates.io</code>, which requires that all upstream dependencies come from the same source.
<h2>Step 2: Import and Wire In</h2>
Harper's comment support lies in the <code>harper-comments</code> crate in <a href="https://github.com/automattic/harper/">the monorepo</a>.
Import the grammar's crate into the project with Cargo.
<pre><code class="hljs language-bash">cargo add &#x3C;CRATE-NAME>
</code></pre>
Then, add lines to the relevant functions in <code>harper-comments/src/comment_parser.rs</code>.
Make sure you visit the <a href="https://microsoft.github.io/language-server-protocol/">Language Server Protocol Specification</a> to obtain the correct language ID.
<pre><code class="hljs language-rust">pub fn new_from_language_id(
 language_id: &#x26;str,
 markdown_options: MarkdownOptions,
) -> Option&#x3C;Self> {
 let language = match language_id {
 "cmake" => tree_sitter_cmake::LANGUAGE,
 "cpp" => tree_sitter_cpp::LANGUAGE,
 "csharp" => tree_sitter_c_sharp::LANGUAGE,
 "c" => tree_sitter_c::LANGUAGE,
 "dart" => harper_tree_sitter_dart::LANGUAGE,
 "go" => tree_sitter_go::LANGUAGE, // Add a line here
</code></pre>
<pre><code class="hljs language-rust">/// Convert a provided path to a corresponding Language Server Protocol file
/// type.
///
/// Note to contributors: try to keep this in sync with
/// [`Self::new_from_language_id`]
fn filename_to_filetype(path: &#x26;Path) -> Option&#x3C;&#x26;'static str> {
 Some(match path.extension()?.to_str()? {
 "bash" => "shellscript",
 "c" => "c",
 "cmake" => "cmake",
 "cpp" => "cpp",
 "cs" => "csharp", // Add a line here
</code></pre>
<h2>Step 3: Testing</h2>
To make sure everything behaves correctly, we need to add some integration tests.
You'll find all the existing ones under <code>harper-comments/tests/language_support_sources</code>.
Find or write several new files under this directory in the language you've added support for.
Add intentional grammatical errors to these file in syntactically relevant places.
We want to make sure that Harper can detect the errors we want and will ignore the errors we do not want.
For example, we might put an error inside an <code>@param</code> tag in JSDoc.
That way we'll know if Harper is not properly ignoring those elements.
Add new entries to the bottom of <code>harper-comments/tests/language_support.rs</code>.
The second parameter of the <code>create_test!</code> macro is the number of grammatical errors that Harper should detect in that file.
<pre><code class="hljs language-rust">create_test!(ignore_shebang_3.sh, 0);
create_test!(ignore_shebang_4.sh, 1);
create_test!(common.mill, 1);
create_test!(basic_kotlin.kt, 0); // Add a line here
</code></pre>
From there, you can run <code>cargo test</code> to make sure everything passes.
<h2>Step 4: Document</h2>
To advertise support for the language, there are a couple addition places that need modification.
Notably:
<ul>
<li>The supported languages table in <code>packages/web/src/routes/docs/integrations/language-server/+page.md</code></li>
<li>The <a href="https://github.com/Automattic/harper/issues/79">GitHub Issue</a></li>
<li>The <code>activationEvents</code> key in the VS Code plugin's manifest: <code>packages/vscode-plugin/package.json</code></li>
</ul>
<h2>Done!</h2>
That should be everything.
Open a draft pull request while you work and ping me (<a href="https://github.com/elijah-potter/">elijah-potter</a>) if you have any questions.
<h2>Additional Resources</h2>
<ul>
<li><a href="https://github.com/Automattic/harper/pull/1443">PR for adding Solidity support</a></li>
<li><a href="https://github.com/Automattic/harper/pull/970">PR for adding Scala support</a></li>
</ul>

A guide for adding a new programming language to the Harper language server.

A guide for adding a new programming language to the Harper language server.

Adding a Programming Language to Harper

training_a_chunker_with_burn

<h1>Training a Chunker with Burn</h1>
<img src="/images/underpass.webp" alt="Graffiti in an Underpass">
In a previous post, I detailed how I implemented a basic nominal phrase chunker using Transformation-based learning.
Since then, I've taken another crack at the problem.
My main goal: improve the accuracy.
The end result is a portable neural network model that achieves ~95% accuracy on grammatically correct text.
<h2>The Failures of the Brill Chunker</h2>
The Brill Chunker was, by many accounts, a success.
It wasn't, however, a success in the main way that mattered: It wasn't reliable enough to be used in Harper's grammatical rule system.
While fast and small, it failed to catch most edge-cases in English text.
In some senses, it overfit its training dataset.
<h2>Our Goal</h2>
We want Harper to be able to match against subjects and objects in sentences.
This is a prerequisite for checking a diverse array of grammatical rules.
For example, to catch the error in this sentence, we need to correctly identify which tokens represent our subject.
<pre><code>Neither of the big blue bottle would be broken by the fall.
</code></pre>
In this case, our user has accidentally made the subject singular, while the verb "neither" implies that the subject should be plural.
We call this an agreement error.
Because our subject, "big blue bottle" contains multiple tokens, we need a way to identify subjects at a higher level than per-token.
That is what a chunker does.
<h2>Why Train Our Own?</h2>
As our needs continue to expand alongside our user-base, I need the chunker to be flexible.
If its needed capabilities expand, I need to be able to retrain the model to meet them.
That would not be possible without having a deep understanding of how the system works.
<h2>Building a Neural Net</h2>
To build a new chunker, I just needed to implement the Harper <code>Chunker</code> trait.
Easy enough.
<pre><code class="hljs language-rust">/// An implementer of this trait is capable of identifying the noun phrases in a provided sentence.
pub trait Chunker {
 /// Iterate over the sentence, identifying the noun phrases contained within.
 /// A token marked `true` is a component of a noun phrase.
 /// A token marked `false` is not.
 fn chunk_sentence(&#x26;self, sentence: &#x26;[String], tags: &#x26;[Option&#x3C;UPOS>]) -> Vec&#x3C;bool>;
}
</code></pre>
For the nerds in the crowd, I decided to use a <code>Word + POS embedding -> BiLSTM -> Linear</code> architecture.
To keep things portable and consistent with the rest of the Harper codebase, I used <a href="https://burn.dev/">Burn</a>, a Rust-native machine learning toolkit.
While I believe the BiLSTM to be good enough for this application, one advantage of Burn is the ability to easily swap it out for a transformer if the need arises.
It also makes it unbelievably easy to quantize models.
This architecture gives us some hyperparameters to tune against.
After dozens of training runs of experimentation, these worked best:
<table>
<thead>
<tr>
<th align="right">Dropout probability</th>
<th>Embedding dimensions</th>
<th align="right">Learning rate (I used Adam)</th>
<th>Dataset</th>
</tr>
</thead>
<tbody>
<tr>
<td align="right">30%</td>
<td>16 Word Embeddings + 8 UPOS Embeddings</td>
<td align="right">0.003</td>
<td>GUM + EWT + LINES</td>
</tr>
</tbody>
</table>
<h2>What's Next?</h2>
Similar to the Brill Chunker, I'll be trying to use this new system in our grammar checker.
From there, I'll know what additional information we'd like for it to infer.
Once I've gotten it to reliably work for >= 3 rules, I'll declare it ready to merge.

The end result is a portable neural network that achieves ~95% accuracy on grammatically correct text.

The end result is a portable neural network that achieves ~95% accuracy on grammatically correct text.

Elijah Potter

Training a Chunker with Burn

The Failures of the Brill Chunker

Our Goal

Why Train Our Own?

Building a Neural Net

What's Next?

Other Stuff

The Simplest Neovim Markdown Setup

For the Love of Iframes

Adding a Programming Language to Harper