WeSearch
Hub / How it works
HOW · IT · WORKS

How WeSearch actually works.

WeSearch is unusually transparent about its pipeline because the pipeline is the product. This is how a story gets from a publisher's RSS feed into your hub, what happens when you react, and where your data does and does not go.

Most news apps are mysterious by design. You scroll, headlines appear, you can't tell whether they were chosen by chronology or by a model. WeSearch is the opposite. Everything that happens between a publisher's RSS feed and your screen is deliberately simple, and we'll walk through it here without hand-waving.

1. The source list

We maintain a hand-curated catalog of 700+ editorial RSS and Atom feeds: legacy newspapers (NYT, Washington Post, Guardian, BBC, Le Monde, Al Jazeera), wire services (Reuters, AP, AFP, Bloomberg), dedicated tech and science press (Ars, Wired, Verge, Nature, Quanta), aggregator-native social signals (Hacker News, relevant subreddits, Mastodon communities, YouTube creators with RSS), and a long tail of editorial blogs. Each entry is a stable id ("nyt-home", "verge", "r-localllama") with a category, a region, and the canonical feed URL.

Sources get added when they consistently publish substantive original reporting and removed when they don't. Adds and removes are recorded in a public changelog so the catalog is auditable.

2. The pull

Every five minutes, the backend pulls each subscribed feed using simple HTTP and a polite User-Agent. We respect ETag and Last-Modified, so a feed that hasn't changed costs us a 304 instead of a full body. We also respect publisher rate limits and rotate through the catalog so no single source gets hammered.

If a feed errors out, it's marked unhealthy and dropped to a longer interval until it recovers. We don't retry aggressively because aggressive retries are how aggregators get banned.

3. The merge and dedupe

Every pulled item gets normalized into a canonical schema (title, description, link, publish time, source name, image, byline, language). We dedupe by canonical URL — when three wires post the same headline, you see the one that landed first, with the others available on the story page.

Then we sort by recency. That is the whole sort. No ranking model, no engagement velocity, no personalization vector. The feed you see is the feed any reader sees, modulo the category filter you pick.

4. The story page

Every unique story gets a stable slug and a server-rendered page at /s/<slug>. The page is generated from the canonical metadata and includes JSON-LD for NewsArticle so search engines see proper publication metadata, breadcrumbs, and an OpenGraph card. The page links out to the original publisher — we never reproduce the full article body.

Asynchronously, an LLM extracts a TL;DR (3–5 sentences), a key-facts bullet list, and tags. None of this is shown as if it were the publisher's reporting; everything is clearly labeled. The publisher's headline and link remain canonical.

5. The community layer

The first time you tap a reaction or post a comment, your browser quietly generates a random API key (a 32-byte hex string) and a stable display handle ("Plain Loom 638") derived from that key via a deterministic word-list. No email, no name, no phone number — see our anonymity stance.

Reactions and comments key off your local API key. The server stores a hash of the key, not the key itself, so even our own logs can't link back to a specific device. Threaded comments support GIFs (via a public-library proxy), comment likes, deletes, and replies up to a few levels deep. The "voices in the room" panel surfaces the most-engaged anonymous commenters this week so you can follow people you appreciate without ever knowing who they are.

6. The pulse and trending

The 24-hour trending row counts community reactions, not engagement velocity or model-predicted virality. A story crosses into trending when enough distinct anonymous handles have reacted to it. The Pulse tab visualizes the platform's live emoji economy, surfaces hot threads, and shows the most-engaged anonymous voices.

7. The push pipeline

If you opt in to push notifications, you can configure exactly what wakes your phone: front-page only, specific sources, specific categories, or keyword matches. The notification system uses VAPID-keyed Web Push, which means OS-level alerts even when the tab is closed. Quiet hours are respected. No third-party push provider sits in the middle.

8. The tracker stack

Open your browser's Network tab on any WeSearch page. You will see one origin: ours. No Google Analytics, no Meta Pixel, no AppNexus, no Doubleclick, no Chartbeat, no Quantcast, no Criteo. Every byte you load comes from our server. We don't have an audience-measurement vendor because we don't measure audiences. Read more on our tracking stance.

9. The hosting

WeSearch runs on a single DigitalOcean droplet — a TypeScript Hono service on Node 25, with libsql/SQLite for storage and a small set of static assets served directly. There is no Cloudflare in the path (we considered it; we like the latency floor better without). There is no CDN reading your traffic. Operations are a single human.

10. What we'd change

The pipeline is intentionally simple, but it isn't perfect. We'd like better deduplication across syndicated wire copy (right now we dedupe by URL; we don't catch the same AP story republished under three different titles by three different desks). We'd like richer per-story summaries that aren't just AI-generated. And we'd like a real comments-search index so people can find old threads. Those are the next iterations; nothing about them changes the principles above.

Frequently asked

How often does the feed update?

Every five minutes. The backend pulls each subscribed RSS feed on a rotating schedule and merges new items into the deduplicated chronological hub feed.

Is the feed personalized?

No. The home feed is chronological and identical for everyone. You can filter by category yourself, but the underlying sort is publish time. There is no personalization model.

Where is your code?

WeSearch runs as a single Hono service in TypeScript with libsql/SQLite for persistence. The architecture is documented and the project's source is publicly readable.

How do reactions stay anonymous?

Your reactions and comments are stored against a hash of a random API key generated on your device. The key never leaves your browser in plaintext, and we cannot reverse the hash to find the original key.