hunch

Compiled search for your static Markdown files.

🔎 Hunch

Compiled search for your static Markdown files.

Quick links to the docs: all docs, configuration, query params, results types, indexing examples, using examples.

Hunch supports these search features:

  • Full text lookup docs
  • Exact phrase matching docs
  • Fuzzy search docs
  • Include matched words for highlighting docs
  • Return only partial snippet docs
  • Search specific fields docs
  • Return specific fields docs
  • Prefix search docs
  • Search suggestions docs
  • Boosting metadata properties docs
  • Ranking docs
  • Facet Limiting docs
  • Facet Matching docs
  • Pagination docs
  • Stop-Words docs
  • Sort by alternate strategy docs

Hunch compiles a search index to store as a JSON file, which you load and use wherever you want to perform search: AWS Lambda, Cloudflare Worker, even directly in the browser!

Install

The usual ways:

npm install hunch

Generate the index

Use it as a CLI tool:

hunch
# shorthand for
hunch --config hunch.config.js

Or use it in code:

import { generate } from 'hunch'
const index = await generate({
  input: './site',
  // other options
})

Query the index

You'll need to load the index from the generated JSON file. In environments with disk access, that's could be as simple as:

import { readFile } from 'node:fs/promises'
const index = JSON.parse(await readFile('./dist/hunch.json', 'utf8'))
// or with upcoming JavaScript, eventually you could do
import index from './dist/hunch.json' assert { type: 'json' }

Then you create a search instance using Hunch, and query it:

import { hunch } from 'hunch'
const search = hunch({ index })
const results = search({ q: 'we get signal' })
/*
results = {
  items: [ ... ],
  page: { ... },
  facets: { ... },
}
*/

Overview

Many modern websites are backed by static Markdown files with some YAML-like metadata at the top, e.g. this file 2022-12-29/cats-and-dogs.md:

---
title: About Cats & Dogs
summary: Where I talk about pets.
published: 2022-12-29
tags: [ cats, dogs ]
series: Animals
---

Fancy words about cats and dogs.

As part of your deployment step, you would use Hunch to generate a pre-computed search index as a JSON file:

hunch --config hunch.config.js

A simple configuration file would specify the content folder (where the Markdown files are), the output filepath to write the JSON file, and other configuration details:

// hunch.config.js
export default {
  // Define the folder to scan.
  input: './site',
  // Define where to write the index file.
  output: './dist/hunch.json',
  // Property names of metadata to treat as "collections", like "tags" or "authors".
  facets: {
    // If it's just a flat string there's nothing to configure.
    series: true,
    // If it's more, like an array, you'll need to specify how Hunch
    // should treat the values. (See documentation for more details.)
    tags: {
      type: 'array',
    }
  },
  // All the facet fields are searchable by default, but you need
  // to specify additional searchable fields.
  searchableFields: [
    'title',
    'summary',
  ],
  // Fields that are not searchable that you want available for access
  // need to be specified. These fields are stored in the index JSON, but
  // not used by Hunch.
  storedFields: [
    'published',
  ],
}

To make a search using this index, you would create a Hunch instance with the index, and then query it:

// Load the generated JSON file in one way or another:
import { readFile } from 'node:fs/promises'
const index = JSON.parse(await readFile('./dist/hunch.json'))

// Create an instance of Hunch using that data:
import { hunch } from 'hunch'
const search = hunch({ index })

// Then query it:
const results = search({
  q: 'fancy words',
  facetMustMatch: { tags: [ 'cats' ] },
  facetMustNotMatch: { tags: [ 'rabbits' ] },
})
/*
results = {
  items: [
    {
      title: 'About Cats & Dogs',
      tags: [ 'cats', 'dogs' ],
      summary: 'Where I talk about pets.',
      published: '2022-12-29',
      series: 'Animals',
      _id: '2022-12-29/cats-and-dogs.md',
      _content: 'Fancy words about cats and dogs.',
    }
  ],
  page: {
    number: 0,
    size: 1,
    total: 1,
  },
  facets: {
    series: {
      Animals: {
        all: 3,
        search: 1,
      },
    },
    tags: {
      cats: {
        all: 5,
        search: 1
      }
      dogs: {
        all: 3,
        search: 1
      }
    },
  },
}
*/

URL Query docs

If you are using Hunch as an API with a URL query parameter interface, such as AWS Lambda, Cloudflare Worker, or even the browser, you can easily transform those query parameters into a Hunch query object:

// from the main
import { fromQuery } from 'hunch'
// or from the named export
import { fromQuery } from 'hunch/from-query'
const query = normalize({
  q: 'fancy words',
  'facet[tags]': 'cats,-rabbits',
})
/*
query = {
  q: 'fancy words',
  facetMustMatch: { tags: [ 'cats' ] },
  facetMustNotMatch: { tags: [ 'rabbits' ] },
}
*/

Additional Notes

Behind the scenes this libary uses MiniSearch for text searching, so look at that documentation if you need anything more esoteric.

⚠️ The output JSON file is an amalgamation of a MiniSearch index and other settings, optimized to save space. There is no guarantee as to the output structure or contents between Hunch versions: you must compile with the same version that you search with!

Some things left to do:

  • [ ] Stemming (undecided if I'll support this...)

License

Published and released under the Very Open License.