How are you implementing site search

@tanzaho, nope sorry, but I can create a gist if you can wait a couple of days :slight_smile:

I am in the process of getting back to Hugo after a long break. I can wait for sure :slight_smile:

That’s interesting. How are you generating the JSON for the search? This thread suggests that feature is missing currently. So I’d like to see your work around.

I look forward to your gist.

Now I’m under pressure :wink:
I’m writing the gist right now.

2 Likes

Here is the gist: https://gist.github.com/sebz/efddfc8fdcb6b480f567
It’s incomplete, I only explain how to generate the JSON index file. I’ll explain how to use it tomorrow :slight_smile:

Hope it helps.

5 Likes

Thanks for writing that up sebz. Doing this outside of Hugo makes sense to me. Though I do feel like Hugo has a lot of logic already built out that would be needed for generating the index file, so in a way it’s a shame that Hugo can’t do the generation.

I left a few comments on your gist.

Thanks @JamesMcMahon!
The gist should now be complete. I’ve added the latest part showing how to use it at runtime.

4 Likes

Thanks, @sebz, lunr.js looks great and your gist gave me a real boost on figuring out how to put everything together.

I’ve gone a slightly different direction with it though and am using a slightly hacky, but effective way to generate the JSON file with Hugo itself so I don’t need Grunt or any external tools like that.

Here’s what I’m doing:

I created a dummy content file like content/json.md that just has its type set to json.

Then I made layouts/json/single.html with something like this:

[{{ range $index, $page := .Site.Pages }}{{ if ne $page.Type "json" }}{{ if $index }},{{ end }}
{
    "href": "{{ $page.RelPermalink }}",
    "title": "{{ $page.Title }}",
    "tags": [{{ range $tindex, $tag := $page.Params.tags }}{{ if $tindex }}, {{ end }}"{{ $tag }}"{{ end }}],
    "content": "{{ $page.PlainWords }}"
}{{ end }}{{ end }}
]

When Hugo publishes the site, it will create public/json/index.html with content that is actually just JSON data. At that point, a simple cp or mv to public/static/js/lunr/PagesIndex.json and you’re in business (I just add that to my deploy script). Everything else should basically work the same.

Obviously, that’s an ugly hack of a template, and anywhere else on your site that you might loop over all the content, you need to be careful to exclude the dummy content. So far, it seems to be pretty reliable for escaping things and generating valid JSON, but it’s not ideal. The .PlainWords method on the Page object doesn’t seem to be documented anywhere and I suppose might disappear or change in future Hugo releases, but currently it’s the cleanest way I could figure out for getting the text of the page content with all markup/rendering stripped out.

With increasingly powerful front-end technology, I think there’s a lot of potential for extending sites this way, publishing a JSON dump of the content (or some subset of it) and letting JS do interesting things with it.

A couple features added to Hugo would make it a lot easier and cleaner:

  • built-in JSON output. Eg, if you could just do $page.JSON and get a nice JSON string out.
  • if not that, at least a cleaner way to get the raw, original text of the content without rendering it.
  • some way to have Hugo publish to a different path, or at least set the extension on the content (making the mv/cp unecessary)
6 Likes

If you wrap a go struct in script tags you get json:

<script>
{{ . }}
</script>

This will, however, not work with Page, because the Go JSON decoder doesn’t handle cyclic refs, see:

My pleasure @thraxil !

Your solution is really smart and solves one of my current issues with .PlainWords. My JSON file is pretty big… and lunrjs indexation is bit long right now…

Plain (string) may be a better choice than PlainWords (string slice).

I don’t see any of these disappear in Hugo; they’re not documented because … no one remembered to do so.

I also don’t see a better way of getting the text with no markup. Hugo now support many different renderers and the only format they have in common is the end result: HTML. So, to strip that HTML away to get plain text may seem hacky, but the alternatives are worse.

As to JSON output. There have been different, but similar requirements, about different output formats (ical, xcal, json …). If someone could come up with a good design that supports these in Hugo; that would be great!

1 Like

Oh, and to exclude/include content, there is the where clauses:

{{ range (.Paginate (where .Data.Pages "Type" "!=" "json")).Pages }}

In my testing, when I used Plain, something about linebreaks in the result was making the JSON invalid. Switching to PlainWords fixed it. I didn’t really spend any time trying to figure out exactly why.

Thanks @sebz!

I used your gist to integrate search. I wrote my own index generator that preprocesses my Markdown (gets rid of all of the words lunr doesn’t use, html elements, …). It’s way less generic than the solution @thraxil provided, but it works for me (read: low chance of it being useful for others). I then created a search.html partial template with some parameters I keep in my site config.toml file. Works great!

I have everything deployed with a Fabric script, so it’s still as easy as hugo new .., edit the post, deploy.

See this:

It ends with a demo with a search prototype in the Gophercon Hugo-built site.

2 Likes

It seems to be building the index using https://github.com/blevesearch/hugoidx

http://www.blevesearch.com/news/Site-Search/

Just spent the afternoon experimenting with Bleve. Very impressed. They do lack a little in the documentation department (I was guessing a little in the document mapping), but that is coming.

But as it is, with the fairly big optional language support, I think it has to stay as a side project of Hugo – plugin, maybe.

@sebz I’m just looking at Hugo for the first time, but I’ve been developing a portfolio site for a bit now using Jekyll. I implemented search using Tipue, but I’m hoping to remove the jQuery dependency entirely and move to lunr.js. Although Hugo is obviously different, any chance you’d be willing to share some of the code for you how you implemented lunr.js? Do you have the ability to customize or “promote” certain results for certain terms? Any help would be greatly appreciated. I’m considering moving a rather large public-facing site to static, and Jekyll is a little too slow for my tastes. Thanks!

In an earlier post he linked this Gist. The comments should help to give you an idea of how the script works.