What I Built

Code Search is a local-first search API for my own repos and reference docs. It does not just grep strings. It indexes code in chunks, generates embeddings for those chunks, writes short natural-language summaries for each one, and lets me search by intent instead of exact wording.

That matters because most code search failures are not actually search failures. They are translation failures.

I remember the behavior I need, not the file name. I remember that something “handles auth rotation” or “renders the usage chart” or “parses markdown into nodes.” The codebase might describe that logic with totally different words. Traditional grep is great when I already know what I am looking for. It is much less useful when I only know the shape of the problem.

So I built a search layer that bridges that gap.

The Problem

Once you have enough active projects, code lookup becomes expensive in a very stupid way.

Not financially expensive at first. Cognitively expensive.

You lose time opening the wrong files. You ask an AI model to scan directories that should have been searchable. You keep repeating the same archaeology session across different repos because the knowledge is there, but retrieval is weak.

That got worse once I started treating my workspace like an actual platform instead of a pile of side projects. There are app repos, internal tools, dashboards, and a growing reference library. Some of the most valuable logic is buried in utility files or project-specific naming conventions that make sense in the moment and become opaque a month later.

I wanted something better than:

  • grep across everything and pray
  • burn premium model tokens on file scanning
  • rely on memory and guesswork

Code Search exists to make code retrieval cheap and local.

Architecture

The stack is intentionally boring in the best way:

  • FastAPI for the API layer
  • SQLite for the index
  • Ollama embeddings for semantic search
  • LLM-generated summaries for higher-quality retrieval
  • Python for ingestion, chunking, and ranking

The system scans project directories and a reference-docs library, then processes files through a code-aware chunking pipeline. Instead of embedding a whole file as one blob, it tries to split at useful boundaries: functions, classes, exports, major blocks. That produces chunks that are small enough to search well and meaningful enough to summarize.

Each chunk stores:

  • file path and project name
  • chunk index and raw content
  • content hash
  • code embedding
  • natural-language summary and summary embedding
  • chunk type metadata

That dual-index approach turned out to be the key design choice.

If you only embed raw code, search works well when the query matches implementation details. If you only embed summaries, you lose precision and exact structure. Storing both lets me combine them with weighted scoring.

Right now the search weights favor summary meaning over raw code similarity. That sounds counterintuitive until you actually use the tool. When I search “where do I handle project health checks” I usually care more about intent than syntax. The summary layer is what understands that a function querying database counts, cache size, and service state is effectively a health-check endpoint even if the query never uses those exact words.

Why Summaries Matter

This is the part that made Code Search feel like a real tool instead of an experiment.

Raw embeddings on code are useful. But code is noisy. Variable names, formatting, imports, framework boilerplate. All of it dilutes what a chunk is actually for.

So for each chunk, the system asks a model to describe what the code does in one or two sentences. Not what it contains. What it does.

That tiny distinction changes search quality a lot.

A summary like “Performs hybrid semantic search across code chunks and summaries using SQLite-stored embeddings and cosine similarity” is much more retrievable than a pile of Python imports and SQL strings. It gives the search index a clean semantic target.

This also makes results easier to scan. Instead of clicking ten files to figure out which one matters, I can skim the summaries and open the right chunk first.

That is the real performance gain. Not just query speed. Decision speed.

Search Modes

Code Search supports three practical modes:

  • Hybrid for the default workflow
  • Code when I want raw implementation matches
  • Summary when I want conceptual retrieval

Hybrid is where most searches land. It combines the precision of code matching with the flexibility of semantic ranking. Exact phrases still surface strongly, but natural-language intent does not get lost.

This matters for the way I actually work.

Sometimes I know the exact term, like a component name or function. Sometimes I only know the behavior, like “the thing that batches query embeddings” or “the route that returns index stats.” Good search has to support both without making me choose a completely different tool.

Local-First Was the Point

I could have built this around hosted vector infrastructure.

I did not want to.

For a solo operator or small shop, local-first tooling is not ideology. It is a real advantage.

I want search that:

  • does not charge per query
  • does not send my codebase to a third party
  • does not break because a vendor changes pricing

SQLite is enough here. FastAPI is enough here. A local embedding model is enough here. That combination keeps the tool fast to restart, easy to inspect, and cheap to run.

The same design choice also changes how often I use it. If every search feels free, I search more aggressively. Less hesitation, less context-switching, fewer moments where I ask a bigger model to do glorified file hunting.

Operational Details That Actually Matter

A lot of the useful engineering is in the unglamorous parts.

The index uses content hashes so re-indexing can stay incremental instead of rebuilding everything from scratch every time a file changes. There is a query embedding cache for repeated searches. The API exposes health data so I can see chunk counts and summary coverage without poking the database manually.

There is also a nightly re-index job because search tools only stay useful if maintenance is automatic.

This is one of those projects where reliability matters more than novelty. Nobody cares if a search API is clever if it quietly drifts out of date.

What I Learned

Developer productivity problems are often retrieval problems wearing different clothes. A lot of “I should document this better” is actually “I need to be able to find what I already documented or built.”

Local infrastructure compounds. Once a local search API exists, other workflows start depending on it. AI agents can check it before scanning repos. Writing workflows can pull technical context from it. Project reviews can use it to find patterns across tools without opening every codebase manually.

Meaning beats raw text surprisingly often. I still use exact search. I still use grep. But for large-codebase recall, the summary layer does more work than I expected.

And the simplest takeaway: if a tool saves you money, keeps your code private, and cuts retrieval time, it probably deserves to exist.

Why This Project Matters

Code Search is not flashy. It is infrastructure.

But infrastructure projects are where the real value lives. This one made my workspace searchable, cut how often premium models get wasted on mechanical scanning, and shortened the path from “I know I built this somewhere” to the exact file that matters.

That is the kind of developer tool I care about most.

Not a demo that looks clever for five minutes. A tool that quietly makes the rest of the stack better every day.

And if your codebase has reached the point where you are relying on memory, vibes, and grep to navigate it, you probably do not need more discipline.

You need better retrieval.