Skip to content

Commit

Permalink
Merge branch 'master' into one_indefinite
Browse files Browse the repository at this point in the history
  • Loading branch information
claydugo authored Jan 6, 2025
2 parents 5e96304 + 2ff381a commit d501646
Show file tree
Hide file tree
Showing 51 changed files with 1,035 additions and 709 deletions.
33 changes: 8 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,10 @@
</div>

[![Harper Binaries](https://github.com/automattic/harper/actions/workflows/build_harper_binaries.yml/badge.svg)](https://github.com/automattic/harper/actions/workflows/build_harper_binaries.yml)
[![Web](https://github.com/automattic/harper/actions/workflows/build_web.yml/badge.svg)](https://github.com/automattic/harper/actions/workflows/build_web.yml)
[![Website](https://github.com/automattic/harper/actions/workflows/build_web.yml/badge.svg)](https://github.com/automattic/harper/actions/workflows/build_web.yml)
[![Precommit](https://github.com/automattic/harper/actions/workflows/precommit.yml/badge.svg)](https://github.com/automattic/harper/actions/workflows/precommit.yml)
[![Crates.io](https://img.shields.io/crates/v/harper-ls)](https://crates.io/crates/harper-ls)
![NPM Version](https://img.shields.io/npm/v/harper.js)

Harper is an English grammar checker designed to be _just right._
I created it after years of dealing with the shortcomings of the competition.
Expand All @@ -31,25 +32,6 @@ Harper is even small enough to load via [WebAssembly.](https://writewithharper.c

Harper currently only supports American English, but the core is extensible to support other languages, so we welcome contributions that allow for other language support.

## Installation

If you want to use Harper on your machine, you have three choices.

### `harper-ls`

`harper-ls` provides an integration that works for most code editors.

[Read more here.](https://writewithharper.com/docs/integrations/language-server)

### Harper Obsidian Integration

If you use [Obsidian](https://obsidian.md/), you may install the [Harper Obsidian Plugin](https://github.com/automattic/harper-obsidian-plugin) by searching for "Harper" in the community plugin store.

### Zed Plugin

If you use [Zed](https://zed.dev/), [Stef16Robbe](https://github.com/Stef16Robbe) has developed a fantastic [plugin](https://github.com/Stef16Robbe/harper_zed) that works out-of-the box.
No setup required.

## Performance Issues

We consider long lint times bugs.
Expand All @@ -58,15 +40,16 @@ If you encounter any significant performance issues, please create an issue on t
If you find a fix to any performance issue, we are open the contribution.
Just make sure to read [our contribution guidelines first.](https://github.com/automattic/harper/blob/master/CONTRIBUTING.md)

## FAQs

### Where did the name Harper come from?
## Links

See [this blog post](https://elijahpotter.dev/articles/naming_harper).
- [Frequently Asked Questions](https://writewithharper.com/docs/faq)
- [`harper-ls` Documentation](https://writewithharper.com/docs/integrations/language-server)
- [Neovim Support](https://writewithharper.com/docs/integrations/neovim)
- [`harper.js` Documentation](https://writewithharper.com/docs/harperjs/introduction)

## Huge Thanks

This project would not be possible without the hard work from those who [contribute](/CONTRIBUTING.md).
This project would not be possible without the hard work from those who [contribute](https://writewithharper.com/docs/contributors/introduction).

<a href="https://github.com/automattic/harper/graphs/contributors">
<img src="https://contrib.rocks/image?repo=automattic/harper" />
Expand Down
2 changes: 1 addition & 1 deletion demo.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ checkers don't cut it. That s where Harper comes in handy.
Harper is an language checker for developers. it can detect
improper capitalization and misspellled words,
as well as a number of other issues.
Like if you break up words you shouldn't.
Like if you break up words you shoul dn't.

Harper works everywhere, even offline. Since you r data
never leaves your device, you don't ned to worry aout us
Expand Down
36 changes: 31 additions & 5 deletions harper-cli/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ use clap::Parser;
use harper_comments::CommentParser;
use harper_core::linting::{LintGroup, LintGroupConfig, Linter};
use harper_core::parsers::Markdown;
use harper_core::{remove_overlaps, Dictionary, Document, FstDictionary};
use harper_core::{remove_overlaps, Dictionary, Document, FstDictionary, TokenKind};

#[derive(Debug, Parser)]
enum Args {
Expand All @@ -30,8 +30,13 @@ enum Args {
Spans {
/// The file you wish to display the spans.
file: PathBuf,
/// Include newlines in the output
#[arg(short, long)]
include_newlines: bool,
},
/// Emit decompressed, line-separated list of words in Harper's dictionary.
/// Get the metadata associated with a particular word.
Metadata { word: String },
/// Emit a decompressed, line-separated list of the words in Harper's dictionary.
Words,
}

Expand Down Expand Up @@ -89,11 +94,15 @@ fn main() -> anyhow::Result<()> {

Ok(())
}
Args::Spans { file } => {
Args::Spans {
file,
include_newlines,
} => {
let (doc, source) = load_file(&file)?;

let primary_color = Color::Blue;
let secondary_color = Color::Magenta;
let unlintable_color = Color::Red;
let filename = file
.file_name()
.map(|s| s.to_string_lossy().into())
Expand All @@ -102,11 +111,19 @@ fn main() -> anyhow::Result<()> {
let mut report_builder =
Report::build(ReportKind::Custom("Spans", primary_color), &filename, 0);
let mut color = primary_color;
for token in doc.tokens() {

for token in doc.tokens().filter(|t| {
include_newlines
|| !matches!(t.kind, TokenKind::Newline(_) | TokenKind::ParagraphBreak)
}) {
report_builder = report_builder.with_label(
Label::new((&filename, token.span.into()))
.with_message(format!("[{}, {})", token.span.start, token.span.end))
.with_color(color),
.with_color(if matches!(token.kind, TokenKind::Unlintable) {
unlintable_color
} else {
color
}),
);

// Alternate colors so spans are clear
Expand Down Expand Up @@ -134,6 +151,15 @@ fn main() -> anyhow::Result<()> {
println!("{}", word_str);
}

Ok(())
}
Args::Metadata { word } => {
let dict = FstDictionary::curated();
let metadata = dict.get_word_metadata_str(&word);
let json = serde_json::to_string_pretty(&metadata).unwrap();

println!("{json}");

Ok(())
}
}
Expand Down
1 change: 0 additions & 1 deletion harper-comments/src/better.rs

This file was deleted.

1 change: 0 additions & 1 deletion harper-comments/src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
#![doc = include_str!("../README.md")]

mod better;
mod comment_parser;
mod comment_parsers;
pub use comment_parser::CommentParser;
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ public static void main(String[] args) {
/**
* This doc has a link in it: {@link this sould b ignor} but not tis
*
* @param name this is an other test.
* @param name this is anoher test.
*/
public static void greet(String name) {
System.out.println("Hello " + name + ".");
Expand Down
15 changes: 14 additions & 1 deletion harper-core/affixes.json
Original file line number Diff line number Diff line change
Expand Up @@ -483,7 +483,9 @@
"replacements": [],
"adds_metadata": {},
"gifts_metadata": {
"noun": {}
"noun": {
"is_plural": false
}
}
},
"2": {
Expand Down Expand Up @@ -557,6 +559,17 @@
}
}
},
"9": {
"suffix": true,
"cross_product": true,
"replacements": [],
"adds_metadata": {},
"gifts_metadata": {
"noun": {
"is_plural": true
}
}
},
"~": {
"suffix": true,
"cross_product": true,
Expand Down
10 changes: 4 additions & 6 deletions harper-core/dictionary.dict
Original file line number Diff line number Diff line change
Expand Up @@ -28344,7 +28344,7 @@ inimitably/
iniquitous/5Y
iniquity/1SM
initial/514SGMDY
initialism/1
initialism/1MS
initialization/1
initialize/4DSG
initialized/4AU
Expand Down Expand Up @@ -45485,7 +45485,7 @@ therapeutically/
therapeutics/1M
therapist/1SM
therapy/14SM
there/18~
there/~
there's/
thereabout/S
thereafter/1
Expand Down Expand Up @@ -45519,7 +45519,7 @@ thermostatic/5
thermostatically/
thesauri/1
thesaurus/1MS
these/8S~
these/8S~9
thesis/1M
thespian/51SM
theta/1SM
Expand All @@ -45544,9 +45544,8 @@ thieve/4DSG
thievery/1M
thieving/451M
thievish/5
thigh/1M
thigh/1MS
thighbone/1MS
thighs/1
thimble/14MS
thimbleful/1SM
thin/514YSP
Expand Down Expand Up @@ -49637,7 +49636,6 @@ scatterplot/14SMG
Wikilink/MS1
stacktrace/SM1
scrollbar/1SM
break-up/1SM
sweetgrass/1SM
PowerShell/SM
WebSocket/SM
Expand Down
3 changes: 1 addition & 2 deletions harper-core/src/document.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,9 @@ use paste::paste;
use crate::parsers::{Markdown, Parser, PlainEnglish};
use crate::patterns::{PatternExt, RepeatingPattern, SequencePattern};
use crate::punctuation::Punctuation;
use crate::token::NumberSuffix;
use crate::vec_ext::VecExt;
use crate::Span;
use crate::{Dictionary, FatToken, FstDictionary, Lrc, Token, TokenKind, TokenStringExt};
use crate::{NumberSuffix, Span};

/// A document containing some amount of lexed and parsed English text.
#[derive(Debug, Clone)]
Expand Down
11 changes: 11 additions & 0 deletions harper-core/src/fat_token.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
use serde::{Deserialize, Serialize};

use crate::TokenKind;

/// A [`Token`](crate::Token) that holds its content as a fat [`Vec<char>`] rather than as a
/// [`Span`](crate::Span).
#[derive(Debug, Clone, Serialize, Deserialize, PartialEq, PartialOrd)]
pub struct FatToken {
pub content: Vec<char>,
pub kind: TokenKind,
}
4 changes: 2 additions & 2 deletions harper-core/src/language_detection.rs
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ mod tests {
#[test]
fn detects_python_fib() {
assert_not_english(
r#"
r"
def fibIter(n):
if n < 2:
return n
Expand All @@ -115,7 +115,7 @@ def fibIter(n):
for _ in range(2, n):
fibPrev, fib = fib, fib + fibPrev
return fib
"#,
",
);
}

Expand Down
28 changes: 14 additions & 14 deletions harper-core/src/lexing/email_address.rs
Original file line number Diff line number Diff line change
Expand Up @@ -109,23 +109,23 @@ mod tests {

fn example_local_parts() -> impl Iterator<Item = Vec<char>> {
[
r#"simple"#,
r#"very.common"#,
r#"x"#,
r#"long.email-address-with-hyphens"#,
r#"user.name+tag+sorting"#,
r#"name/surname"#,
r#"admin"#,
r#"example"#,
r"simple",
r"very.common",
r"x",
r"long.email-address-with-hyphens",
r"user.name+tag+sorting",
r"name/surname",
r"admin",
r"example",
r#"" ""#,
r#""john..doe""#,
r#"mailhost!username"#,
r"mailhost!username",
r#""very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual""#,
r#"user%example.com"#,
r#"user-"#,
r#"postmaster"#,
r#"postmaster"#,
r#"_test"#,
r"user%example.com",
r"user-",
r"postmaster",
r"postmaster",
r"_test",
]
.into_iter()
.map(|s| s.chars().collect())
Expand Down
55 changes: 41 additions & 14 deletions harper-core/src/lexing/hostname.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,30 @@
use crate::TokenKind;

use super::FoundToken;

/// Lex a hostname token.
pub fn lex_hostname_token(source: &[char]) -> Option<FoundToken> {
let len = lex_hostname(source)?;

// Might be word, just skip it.
if len <= 1 {
return None;
}

if !source.get(1..len - 1)?.contains(&'.') {
return None;
}

if source.get(len - 1) == Some(&'.') {
return None;
}

Some(FoundToken {
next_index: len,
token: TokenKind::Hostname,
})
}

pub fn lex_hostname(source: &[char]) -> Option<usize> {
let mut passed_chars = 0;

Expand Down Expand Up @@ -25,20 +52,20 @@ pub mod tests {

pub fn example_domain_parts() -> impl Iterator<Item = Vec<char>> {
[
r#"example.com"#,
r#"example.com"#,
r#"example.com"#,
r#"and.subdomains.example.com"#,
r#"example.com"#,
r#"example.com"#,
r#"example"#,
r#"s.example"#,
r#"example.org"#,
r#"example.org"#,
r#"example.org"#,
r#"strange.example.com"#,
r#"example.org"#,
r#"example.org"#,
r"example.com",
r"example.com",
r"example.com",
r"and.subdomains.example.com",
r"example.com",
r"example.com",
r"example",
r"s.example",
r"example.org",
r"example.org",
r"example.org",
r"strange.example.com",
r"example.org",
r"example.org",
]
.into_iter()
.map(|s| s.chars().collect())
Expand Down
Loading

0 comments on commit d501646

Please sign in to comment.