How are you implementing site search

Hi all,

How are you implementing search on your Hugo websites? I see Google Custom Search on the Hugo.io docs page. Is that the path of least friction?

Would you implement it as a partial? (I assume it’s a chunk of JavaScript.) So Google’s solution doesn’t need a database, just indexes static pages?

Thanks,

Joe

2 Likes

JoeWeb discuss@gohugo.io writes:

How are you implementing search on your Hugo websites? I see Google
Custom Search on the Hugo.io docs page. Is that the path of least
friction?

I plan to add custom search for StartPage and/or DuckDuckGo, but, yes,
same/similar thing.

Google Custom Search is an obvious and simple choice, partial is smart. Have a look in the docs code to see how it’s done there.

1 Like

Google search here too. Although I’m not exactly comfortable with Google’s data-harvesting ethos, I think their custom search is the best on offer [I looked at a few].

The results page is very customiseable [just override their default styles in your CSS] and I like the way it overlays search results on top of the current page, as opposed to taking you off-site or showing the results in a new window as most of the others do.

One concern I do have is that the free version uses adverts. I don’t see any of these myself, as I always use an ad-blocker, but I do sometimes wonder if it looks a bit tacky for my visitors…

…then I remember no f**ker ever visits my website anyway, so it doesn’t really matter!

You can have a look at [IMHO] a nicely integrated CSS-wise Google Search on my site here: http://stiobhart.net

3 Likes

Thanks all. I didn’t know that DuckDuckGo offered site search.

stiobhart – It’s interesting that adverts only appear on some searches, at least on your site. Try searching for solution, then corporate. I only see adverts for the latter (I pulled both terms from your homepage content.)

Also, your search box disappears on mobile. I guess mobile requires some extra steps?

Joe

@JoeWeb cheers for the extra pair of eyes on the site. The mobile version is a bit of a permanent ‘work-in-progress’. At the moment the search box is set to display:none on mobile as otherwise it part obscures the header image and looks really crappy.

As I said, I never see ads at all as I use blockers on all my browsers. But I think what you’re seeing sounds about right. AFAIK Google ‘reserves the right’ to slap ads on your search results but that doesn’t necessarily mean they’d always do so. I’d have thought only for suitably ‘keywordy’ searches.

Re: DuckDuckGo. although I use them for my day-to-day web searching, I decided to opt for big bad Google on my own site, just because it was the only one out of the three I looked at [Google CSE, DuckDuckGo Search Box and Yandex Site Search] that allowed you to embed the search results within your own site.

DuckDuckGo and Yandex both take you back to their own sites to show you the results which feels a bit ‘broken’ to me. DuckDuckGo say they do so because “…we do not have the syndication rights to allow you to host our results on your site…”, which is presumably because they pretty much just act as an anonymiser for search results from other search engines. It’s a pity Yandex don’t allow better integration. They could easily do so as unlike DDG, ‘they are their own man’ in this respect.

Correction: Yandex do allow you the option to display search results on your own site, but require you to have a dedicated search results page to display them on.

1 Like

Hi everyone,

I opted for a custom solution based on Gruntjs + lunr.js to build a static index (json file) out of the .md files. Then at runtime, I provide a simple input field that trigger a search using lunr.js to search through the static index.

What I like with this solution is that it follows the “static FTW” philosophy of Hugo and is really flexible regarding the way your content is indexed by lunr.js (I’m using Hugo “tags” to prioritize the search results) and the design integration in your website.

My 2 cents :slight_smile:
Cheers!

Sebz.

4 Likes

Hi @sebz, I like your suggestion. By chance, have you posted an explaination (or your gruntfile) somewhere ?

@tanzaho, nope sorry, but I can create a gist if you can wait a couple of days :slight_smile:

I am in the process of getting back to Hugo after a long break. I can wait for sure :slight_smile:

That’s interesting. How are you generating the JSON for the search? This thread suggests that feature is missing currently. So I’d like to see your work around.

I look forward to your gist.

Now I’m under pressure :wink:
I’m writing the gist right now.

2 Likes

Here is the gist: https://gist.github.com/sebz/efddfc8fdcb6b480f567
It’s incomplete, I only explain how to generate the JSON index file. I’ll explain how to use it tomorrow :slight_smile:

Hope it helps.

5 Likes

Thanks for writing that up sebz. Doing this outside of Hugo makes sense to me. Though I do feel like Hugo has a lot of logic already built out that would be needed for generating the index file, so in a way it’s a shame that Hugo can’t do the generation.

I left a few comments on your gist.

Thanks @JamesMcMahon!
The gist should now be complete. I’ve added the latest part showing how to use it at runtime.

4 Likes

Thanks, @sebz, lunr.js looks great and your gist gave me a real boost on figuring out how to put everything together.

I’ve gone a slightly different direction with it though and am using a slightly hacky, but effective way to generate the JSON file with Hugo itself so I don’t need Grunt or any external tools like that.

Here’s what I’m doing:

I created a dummy content file like content/json.md that just has its type set to json.

Then I made layouts/json/single.html with something like this:

[{{ range $index, $page := .Site.Pages }}{{ if ne $page.Type "json" }}{{ if $index }},{{ end }}
{
    "href": "{{ $page.RelPermalink }}",
    "title": "{{ $page.Title }}",
    "tags": [{{ range $tindex, $tag := $page.Params.tags }}{{ if $tindex }}, {{ end }}"{{ $tag }}"{{ end }}],
    "content": "{{ $page.PlainWords }}"
}{{ end }}{{ end }}
]

When Hugo publishes the site, it will create public/json/index.html with content that is actually just JSON data. At that point, a simple cp or mv to public/static/js/lunr/PagesIndex.json and you’re in business (I just add that to my deploy script). Everything else should basically work the same.

Obviously, that’s an ugly hack of a template, and anywhere else on your site that you might loop over all the content, you need to be careful to exclude the dummy content. So far, it seems to be pretty reliable for escaping things and generating valid JSON, but it’s not ideal. The .PlainWords method on the Page object doesn’t seem to be documented anywhere and I suppose might disappear or change in future Hugo releases, but currently it’s the cleanest way I could figure out for getting the text of the page content with all markup/rendering stripped out.

With increasingly powerful front-end technology, I think there’s a lot of potential for extending sites this way, publishing a JSON dump of the content (or some subset of it) and letting JS do interesting things with it.

A couple features added to Hugo would make it a lot easier and cleaner:

  • built-in JSON output. Eg, if you could just do $page.JSON and get a nice JSON string out.
  • if not that, at least a cleaner way to get the raw, original text of the content without rendering it.
  • some way to have Hugo publish to a different path, or at least set the extension on the content (making the mv/cp unecessary)
6 Likes

If you wrap a go struct in script tags you get json:

<script>
{{ . }}
</script>

This will, however, not work with Page, because the Go JSON decoder doesn’t handle cyclic refs, see:

My pleasure @thraxil !

Your solution is really smart and solves one of my current issues with .PlainWords. My JSON file is pretty big… and lunrjs indexation is bit long right now…

Plain (string) may be a better choice than PlainWords (string slice).

I don’t see any of these disappear in Hugo; they’re not documented because … no one remembered to do so.

I also don’t see a better way of getting the text with no markup. Hugo now support many different renderers and the only format they have in common is the end result: HTML. So, to strip that HTML away to get plain text may seem hacky, but the alternatives are worse.

As to JSON output. There have been different, but similar requirements, about different output formats (ical, xcal, json …). If someone could come up with a good design that supports these in Hugo; that would be great!

1 Like

Oh, and to exclude/include content, there is the where clauses:

{{ range (.Paginate (where .Data.Pages "Type" "!=" "json")).Pages }}