How it works
The multi-step Cloudflare Workflows pipeline that powers every ShopSniffer report, from sitemap to exported CSV.
Overview
When you create a report, ShopSniffer runs a six-stage pipeline orchestrated by Cloudflare Workflows with built-in retry logic. Each stage produces intermediate data the next stage depends on, and real-time progress is pushed to the browser via a Durable Object-backed WebSocket.
The pipeline
graph LR A[Sitemapextraction] --> B[Storeaudit] B --> C[Productindexing] C --> D[PageSpeedaudit] D --> E[Insightgeneration] E --> F[Exportcompilation] F --> G((Completed)) A -.->|sitemap.xml| H[(D1)] B -.->|theme, apps, scripts| H C -.->|products, collections, pages| H D -.->|Lighthouse JSON| H E -.->|top vendors, price stats| H F -.->|CSV + JSON files| I[(R2)] classDef stage fill:#6366f1,stroke:#4f46e5,color:#fff classDef done fill:#10b981,stroke:#059669,color:#fff class A,B,C,D,E,F stage class G done
Step by step
Sitemap extraction
We parse the store's sitemap.xml to discover every product, collection, and page URL. This is the fastest way to get a complete URL inventory without crawling.
Store audit
A headless browser renders the homepage via Cloudflare Browser Rendering. We detect the active theme (from Shopify.theme), installed apps (from loaded scripts and DOM signatures), JavaScript libraries, and meta tags.
Product indexing
Every product, collection, and page URL discovered in step 1 is fetched in parallel batches of 20, pulling the full JSON data from Shopify's public /products.json, /collections.json, and /pages.json endpoints.
PageSpeed audit
Google PageSpeed Insights runs a full Lighthouse audit on the store's homepage across performance, accessibility, best practices, and SEO categories.
Insight generation
We aggregate the indexed data to produce insights: top vendors by product count, price range statistics (min / max / avg), product type distribution, and change detection versus the previous snapshot.
Export compilation
Products are compiled into Shopify-compatible CSV. We also generate JSON exports for products, collections, and pages. All files are uploaded to Cloudflare R2 for global CDN access.
Steps run with per-step retries and exponential backoff. A transient failure in one step (e.g. a single product fetch timing out) won't fail the whole job — the step retries, and if it ultimately exhausts retries, the workflow surfaces a structured error on the affected item while letting the rest succeed.
Real-time progress
Each step pushes progress updates to a JobProgressDO Durable Object keyed by job ID. The browser subscribes via WebSocket at wss://shopsniffer.com/api/ws/:jobId and receives status, step, and progress messages in real time — no polling required.
See GET /api/ws/:jobId for the WebSocket protocol.
Why Cloudflare Workflows
We use Cloudflare Workflows instead of a traditional queue + worker setup for three reasons:
- Durable step execution — if the worker crashes mid-pipeline, the workflow resumes from the last completed step, not from scratch.
- Built-in retries — each step gets per-step retry policies with exponential backoff. No custom retry logic.
- First-class observability — every step is logged with structured timing, which is how the status endpoint knows how far along a job is.
Most jobs complete within 30-60 seconds. Large stores (thousands of products) can take up to 5 minutes. The longest step is usually product indexing; PageSpeed auditing is the biggest source of variance because it depends on Google's API latency.