Over the last few months, I have built and launched a free semantic search tool for GitHub called SemHub. In this blog post, I share what I’ve learned and why I’ve failed, so that other builders can learn from my experience. This blog post runs long and I have sign-posted each section. I have marked the sections that I consider the particularly insightful with an asterisk (*
).
I have also summarized my key lessons here:
GitHub is the default place to host open source projects. But this privileged position also means that GitHub does not really need to compete on improving its UX. I am sure many developers are like me and have schlep blindness as regards how bad GitHub’s UX is. Just in terms of searching issues:
At Coder.com, we manage multiple public and private repos on GitHub and we encounter these pain points daily. Wouldn’t it be nice to be able to search across, not just the repos we own, but the repos of similar projects, to see how they approach a given problem? What if I don’t know whether the issue I am searching for is open or closed? What if I want to perform a fuzzy search?
Fortunately, for all its flaws, GitHub has a fairly open API. Ammar, a cofounder at Coder, took matters into his own hands and built a semantic search feature in Coder’s internal tool, which works surprisingly well! Surely the wider open-source community could benefit from this? He brought me on to do exactly and, thus, the idea for SemHub was born.
SemHub’s goal is to enable semantic issue searches across multiple GitHub repos that anyone could use. Its core features are simple: