Spelling and grammar api

Summary

This incident examines why the sentence “I will be coming yesterday.” is not flagged as incorrect by several grammar‑checking systems—Word, Excel, Grammarly, and a self‑hosted LanguageTool instance without n‑grams. Although the sentence is clearly ungrammatical to a human, rule‑based grammar engines often fail to detect semantic‑tense conflicts unless they have statistical or contextual models enabled.

Root Cause

The core issue is that rule‑based grammar checkers do not understand temporal logic. They validate grammar structure, not meaning.

Key factors:

  • Syntactically valid structure:
    • SubjectAuxiliary verbProgressive verbTemporal adverb
    • This pattern is grammatically well‑formed even though the meaning is impossible.
  • No semantic reasoning:
    • Tools without n‑grams cannot detect that “will be coming” (future continuous) contradicts “yesterday” (past time).
  • Missing statistical language models:
    • LanguageTool’s n‑gram feature provides probability‑based detection of unlikely or contradictory word combinations.
    • Without n‑grams, the system cannot judge that this combination is improbable or nonsensical.

Why This Happens in Real Systems

Real grammar engines are typically built from:

  • Rule-based syntax checkers
    These detect structural errors (agreement, missing words, incorrect forms).
  • Statistical or ML-based semantic checkers
    These detect improbable or contradictory phrases.
  • Lexical databases
    These detect spelling and known word misuse.

When you disable or omit the statistical component:

  • The engine becomes blind to meaning.
  • Only structural correctness is evaluated.
  • Temporal contradictions go undetected.

Real-World Impact

When semantic checks are missing, systems may:

  • Approve sentences that are syntactically correct but logically impossible.
  • Fail to catch tense–time adverb conflicts.
  • Miss contextual errors such as:
    • “He eats yesterday.”
    • “She will arrive last week.”
    • “I am going tomorrow, but I left already.”

These errors can slip into:

  • Customer‑facing documents
  • Automated content pipelines
  • NLP‑driven applications
  • Educational tools

Example or Code (if necessary and relevant)

Below is a minimal LanguageTool example showing how a rule-based engine accepts the sentence:

JLanguageTool lt = new JLanguageTool(new BritishEnglish());
List matches = lt.check("I will be coming yesterday.");

This returns zero matches when n‑grams are not configured.

How Senior Engineers Fix It

Experienced engineers address this by adding semantic probability layers on top of rule-based grammar:

  • Enable n‑grams in LanguageTool
    • N‑grams detect statistically improbable word sequences.
  • Use a hybrid grammar engine
    • Combine rules + ML models for contextual understanding.
  • Add custom rules
    • Example: flag future tense combined with past‑time adverbs.
  • Deploy a full language model backend
    • Modern LLMs detect semantic contradictions far more reliably.
  • Monitor false negatives
    • Log and analyze sentences that pass incorrectly.

Key takeaway:
Semantic errors require statistical or ML-based detection—rules alone are insufficient.

Why Juniors Miss It

Junior engineers often assume:

  • Grammar checking = understanding meaning
    (It does not.)
  • Rule-based systems catch all errors
    (They only catch structural ones.)
  • Temporal contradictions are “obvious”
    (Only to humans, not to deterministic parsers.)
  • Running LanguageTool without n‑grams is “good enough”
    (It removes the semantic layer entirely.)

They typically overlook:

  • The difference between syntax and semantics
  • The importance of probabilistic models
  • The role of n‑grams in contextual error detection

If you want, I can also outline a custom rule that detects tense–time conflicts for your LanguageTool deployment.

Leave a Comment