How it works

The multi-step Cloudflare Workflows pipeline that powers every ShopSniffer report, from sitemap to exported CSV.

Overview

When you create a report, ShopSniffer runs a six-stage pipeline orchestrated by Cloudflare Workflows with built-in retry logic. Each stage produces intermediate data the next stage depends on, and real-time progress is pushed to the browser via a Durable Object-backed WebSocket.

The pipeline

graph LR
A[Sitemapextraction] --> B[Storeaudit]
B --> C[Productindexing]
C --> D[PageSpeedaudit]
D --> E[Insightgeneration]
E --> F[Exportcompilation]
F --> G((Completed))
A -.->|sitemap.xml| H[(D1)]
B -.->|theme, apps, scripts| H
C -.->|products, collections, pages| H
D -.->|Lighthouse JSON| H
E -.->|top vendors, price stats| H
F -.->|CSV + JSON files| I[(R2)]
classDef stage fill:#6366f1,stroke:#4f46e5,color:#fff
classDef done fill:#10b981,stroke:#059669,color:#fff
class A,B,C,D,E,F stage
class G done

Step by step

1

Sitemap extraction

We parse the store's sitemap.xml to discover every product, collection, and page URL. This is the fastest way to get a complete URL inventory without crawling.

2

Store audit

A headless browser renders the homepage via Cloudflare Browser Rendering. We detect the active theme (from Shopify.theme), installed apps (from loaded scripts and DOM signatures), JavaScript libraries, and meta tags.

3

Product indexing

Every product, collection, and page URL discovered in step 1 is fetched in parallel batches of 20, pulling the full JSON data from Shopify's public /products.json, /collections.json, and /pages.json endpoints.

4

PageSpeed audit

Google PageSpeed Insights runs a full Lighthouse audit on the store's homepage across performance, accessibility, best practices, and SEO categories.

5

Insight generation

We aggregate the indexed data to produce insights: top vendors by product count, price range statistics (min / max / avg), product type distribution, and change detection versus the previous snapshot.

6

Export compilation

Products are compiled into Shopify-compatible CSV. We also generate JSON exports for products, collections, and pages. All files are uploaded to Cloudflare R2 for global CDN access.

Steps run with per-step retries and exponential backoff. A transient failure in one step (e.g. a single product fetch timing out) won't fail the whole job — the step retries, and if it ultimately exhausts retries, the workflow surfaces a structured error on the affected item while letting the rest succeed.

Real-time progress

Each step pushes progress updates to a JobProgressDO Durable Object keyed by job ID. The browser subscribes via WebSocket at wss://shopsniffer.com/api/ws/:jobId and receives status, step, and progress messages in real time — no polling required.

See GET /api/ws/:jobId for the WebSocket protocol.

Why Cloudflare Workflows

We use Cloudflare Workflows instead of a traditional queue + worker setup for three reasons:

  1. Durable step execution — if the worker crashes mid-pipeline, the workflow resumes from the last completed step, not from scratch.
  2. Built-in retries — each step gets per-step retry policies with exponential backoff. No custom retry logic.
  3. First-class observability — every step is logged with structured timing, which is how the status endpoint knows how far along a job is.

Most jobs complete within 30-60 seconds. Large stores (thousands of products) can take up to 5 minutes. The longest step is usually product indexing; PageSpeed auditing is the biggest source of variance because it depends on Google's API latency.

Next steps

Store reports

What's inside a completed report.

Learn More
Jobs API

Create, poll, and subscribe to progress.

Learn More
Monitoring workflows

The background cron schedule that re-runs this pipeline daily.

Learn More
Downloads

What files come out the other end.

Learn More
Ask a question... ⌘I