149,000 URLs, 5 Minutes, Zero Dollars: How We Replaced an Enterprise SEO Engagement
Part 2 of the Redirect Lifeguard series. An enterprise SEO firm couldn't scope the timeline. Premium tools gate this behind expensive tiers. We solved it in an afternoon with TypeScript and clear thinking.
We hired an enterprise SEO firm to fix our redirect situation. After an extended engagement, they couldn’t begin to execute. The timeline was unknown — not long, not aggressive, literally unknown. They couldn’t tell us when or if they’d deliver.
The problem wasn’t complicated to state: our ecommerce site had accumulated years of URL changes from platform migrations, product additions and removals, category restructuring, and seasonal inventory rotation. Canonical URLs that we owned. URLs that existed on the internet, that search engines had indexed, that customers had bookmarked. Were returning 404s or pointing to the wrong pages. Every broken URL was a small leak in our authority and a small failure in customer experience. Aggregated across a domain with tens of thousands of product pages, the leaks were material.
The enterprise firm understood the problem. They had dashboards and crawlers and reporting tools. What they didn’t have was a systematic approach to resolving it at scale. They could identify broken URLs one at a time, or in small batches, but the volume overwhelmed their process.
Premium SEO tools offer redirect management as a feature. At their higher pricing tiers. The basic principle is straightforward: crawl the domain, identify broken URLs, match them to valid destinations, generate redirect rules. The tools charge hundreds of dollars per month for this because they’ve wrapped a fundamentally simple process in a product with accounts, dashboards, team management, and incremental feature gates designed to push you toward enterprise pricing.
We found this absurd.
The Actual Problem
Strip away the tooling and the enterprise process, and the problem reduces to three operations.
First, enumerate all canonical URLs that the internet believes belong to your domain. These exist in search engine indexes, in backlink databases, in the Wayback Machine, in customer bookmarks, in affiliate links, in social media posts. They are URLs that point to your domain and expect to find content. The total count for our domain was approximately 149,000 URLs.
Second, compare that enumeration against your current sitemap: the URLs that actually resolve to valid pages on your live site. The delta between “URLs the internet expects” and “URLs that actually work” is your redirect problem. Everything in the first set but not in the second set is either a URL that needs a redirect or a URL that should be discarded.
Third, for each URL in the delta, determine the correct disposition: redirect to a matching current page, redirect to a category or collection page, or discard as junk (spam URLs, malformed crawl artifacts, test pages that were never meant to be public).
That’s it. Three operations. Enumeration, comparison, disposition. The enterprise firm was struggling with the first operation because their tools weren’t designed for bulk canonical enumeration at this scale. The premium SaaS tools handle it but gate it behind pricing tiers that assume you’re a large organization with a large budget.
The Solution
The tool we built does all three operations. It’s TypeScript, it runs locally, and it produces a complete redirect map.
The enumeration phase aggregates canonical URLs from multiple sources. Search console exports, backlink databases, sitemap archives, and crawl data. The key insight is that you don’t need to crawl the internet yourself. The data already exists in free or low-cost sources. You just need to aggregate and deduplicate it. For our domain, this produced approximately 149,000 unique canonical URLs that the internet associated with our domain.
The comparison phase scrubs the enumerated URLs against the current sitemap. This is a set difference operation. Computationally trivial once both sets are normalized. URL normalization (lowercasing, trailing slash handling, query parameter sorting, fragment removal) is the only non-obvious step, and it’s well-understood.
The disposition phase is where the intelligence lives. The tool uses pattern matching, path similarity scoring, and SKU extraction to match orphaned URLs to their most likely current-page equivalent. A URL like /products/blue-ornament-2023 matches to /products/blue-ornament with high confidence. A URL like /collections/holiday-sale-2022 matches to /collections/holiday if the seasonal qualifier is stripped.
The Results
Of 149,000 enumerated URLs, the scrub against the current sitemap identified approximately 21,000 that matched completely to existing pages on the website. These needed no redirects. They were already working.
The remaining URLs fell into two categories: junk (spam referrals, malformed crawl artifacts, test URLs, URLs from domains we no longer operate) and legitimate redirects. The junk URLs were discarded programmatically based on pattern rules.
The legitimate redirects were matched to destination pages automatically. The exception count was three. Three URLs out of 149,000 that required manual disposition. One was an extreme long-tail URL. So specific and low-traffic that the correct action was to let it 404 rather than create a misleading redirect. The other two were edge cases that pointed to product types rather than specific products, so they were redirected to the relevant collection pages.
Three manual decisions. Everything else was automated.
The Economics
Development time was measured in hours, not weeks or months. The tool was built in a single focused session. Once built and validated, a complete run. Enumeration, comparison, disposition, and redirect map generation. Executes in under five minutes.
The ongoing cost is zero. The tool runs locally. The data sources are free. There are no subscriptions, no API costs, no per-URL charges.
Compare this to the enterprise engagement: unknown timeline, unknown cost, unknown completion criteria. Compare it to the premium SaaS tools: $200-500/month for redirect management features, ongoing subscription required, vendor lock-in on the redirect rules themselves.
The gap isn’t a matter of degree. It’s a matter of kind. The enterprise approach treats redirect management as a service to be purchased. The lean approach treats it as a problem to be understood and solved. Once solved, the solution is an asset. Reusable, modifiable, and free to operate.
The Deeper Point
This isn’t really a story about redirects. It’s a story about the gap between understanding a problem and purchasing a solution to a problem.
The enterprise firm had tools. They had people. They had process. What they didn’t have was a clear decomposition of the problem into its fundamental operations. They were trying to solve “fix the redirects” as a monolithic engagement rather than decomposing it into enumeration, comparison, and disposition. Each of which is individually straightforward.
The premium SaaS tools understood the decomposition but monetized it by adding friction. They gate the enumeration behind API limits, the comparison behind export restrictions, and the disposition behind tier-locked features. The value isn’t in the computation. It’s in the access control.
When you understand the problem at the level of its actual operations, you realize that the computation is trivial, the data is available, and the entire enterprise and SaaS apparatus around redirect management exists to charge rent on a solved problem. Building your own tool isn’t heroic. It’s just clear thinking applied to a problem that the market has an incentive to keep opaque.
In Part 3, we’ll examine the technical architecture of Redirect Lifeguard in detail. The URL normalization pipeline, the similarity scoring algorithm, and the pattern-based disposition engine that reduced 149,000 URLs to three manual decisions.