SEO Audit Engine

Full-spectrum SEO audit in a single Python file \342\200\224 Lighthouse scores, on-page crawl analysis, and optional competitor comparison. Outputs a branded PDF (or markdown fallback) to /tmp/.

When to use this skill

Trigger this skill when the user:

Says "audit my website" / "check the SEO of [URL]" / "why isn't [site] ranking?"
Asks for a technical SEO report covering meta tags, headings, broken links, sitemap, schema markup
Wants Lighthouse scores for performance, accessibility, SEO, or best-practices
Asks to compare SEO against a competitor ("how does our SEO stack up vs. [rival]?")
Needs to find quick-win SEO fixes before a launch or site migration
Mentions missing alt text, broken links, missing meta descriptions, robots.txt issues

When NOT to use this skill

Content SEO / keyword strategy \342\200\224 this audits structure and technical signals, not content quality or keyword gap analysis (use a dedicated keyword-research tool)
Ongoing rank tracking \342\200\224 this is a point-in-time snapshot, not a monitoring service
Sites behind authentication \342\200\224 the crawler uses unauthenticated HTTP; login-gated pages will return 401/redirect, not real content
Very large sites \342\200\224 the depth-1 crawl only analyses the landing URL; multi-page crawls hit the max_links_to_check cap
JavaScript-heavy SPAs \342\200\224 BeautifulSoup parses the raw HTML; if content is injected by JS at runtime, Lighthouse will still see it but the crawl checks won't

Pitfalls

npx lighthouse first-run download. On a machine that hasn't run Lighthouse before, npx --yes lighthouse downloads ~120 MB of Chrome/Lighthouse to the npm cache. Allow 1\342\200\2233 minutes. Subsequent runs are instant. To skip this, install globally: npm install -g lighthouse then verify with lighthouse --version.
Headless Chrome sandbox in Docker/CI. Inside Docker containers, --no-sandbox is required (already included in LH_CHROME_FLAGS). If you still get CHROME_CRASH errors, also pass --disable-setuid-sandbox or run the container with --cap-add=SYS_ADMIN. On WSL2 you may need --disable-gpu --single-process.
weasyprint system dependencies. pip install weasyprint alone is not enough on Ubuntu/Debian \342\200\224 weasyprint needs Pango: sudo apt install -y libpango-1.0-0 libpangoft2-1.0-0. On macOS: brew install pango. Without these, weasyprint raises OSError: cannot load library 'libgobject-2.0-0'. The skill catches ImportError and falls back gracefully to a .md file \342\200\224 but the PDF won't be generated until the system dep is in place.
Lighthouse timeout on slow sites. The default --timeout 120 (seconds) covers most sites. Very slow servers or large pages may need --timeout 180. CI runners in shared environments can be slow \342\200\224 bump to 240s if you see consistent timeouts.
robots.txt blocks the audit bot. Some sites explicitly Disallow unknown user agents. If robots.txt returns 200 but the main page returns 403 or redirects to a CAPTCHA, the crawl will be empty. Check robots.txt manually and whitelist the audit user-agent if needed, or pass --skip-lighthouse and only check what's publicly accessible.
requests.head() vs requests.get() for link-checking. Some CDNs return 405 (Method Not Allowed) on HEAD requests but 200 on GET. The skill uses HEAD for efficiency; a 405 is NOT counted as a broken link, but if you see false-positives you can switch to GET in the source. The max_links_to_check cap (default 50) prevents runaway HTTP calls on link-heavy pages.
JSON-LD on dynamic pages. Schema markup added by JS (e.g., injected by Google Tag Manager) will not appear in the BeautifulSoup crawl because BS4 parses the static HTML. Lighthouse's SEO audit will catch it (Chrome executes JS). If you see Lighthouse SEO score 90+ but crawl reports "No schema.org JSON-LD", the markup is JS-injected \342\200\224 use --skip-lighthouse result as the ground truth for structure, and Lighthouse for the final signal.

Connector status

Connector	Status	Notes
Google Lighthouse	\342\234\205 Live	via `npx lighthouse` subprocess; requires Node 18+ and Chrome
Page crawler (requests + BS4)	\342\234\205 Live	Static HTML only; no JS execution
PDF output (weasyprint)	\342\234\205 Live (optional)	Falls back to `.md` if weasyprint/libpango not installed
Competitor comparison	\342\234\205 Live	Pass `--competitor URL`; runs full audit on rival, diffs issues
Multi-page deep crawl	\360\237\223\213 Planned	Currently depth-1 only; follow-links crawl coming
Google Search Console integration	\360\237\223\213 Planned	Would add real impression/click data to the report

Output sample

Running against a typical SaaS landing page:

\360\237\224\216 SEO Audit Engine
   Target     : https://example.com
   Output     : Markdown

\360\237\217\227\357\270\217  Lighthouse audit \342\200\246
   performance        \360\237\237\241 72
   accessibility      \360\237\237\242 94
   seo                \360\237\237\242 91
   best-practices     \360\237\237\241 83

\360\237\225\267\357\270\217  Crawling https://example.com \342\200\246
    \360\237\217\267\357\270\217  Meta tags \342\200\246
    \360\237\226\274\357\270\217  Images \342\200\246
    \360\237\223\220 Heading hierarchy \342\200\246
    \360\237\224\227 Links \342\200\246
    \360\237\244\226 robots.txt \342\200\246
    \360\237\227\272\357\270\217  Sitemap \342\200\246
    \360\237\223\213 Schema.org \342\200\246
   \342\206\263 5 issue(s) found

\360\237\223\235 Building report \342\200\246
\360\237\223\235 Markdown report \342\206\222 /tmp/seo-audit-example-com-2026-04-25.md

\342\234\205 Done!  /tmp/seo-audit-example-com-2026-04-25.md
   Issues: 5

Report excerpt:

# SEO Audit Report \342\200\224 example.com

> **Generated:** 2026-04-25  |  **Target:** https://example.com

## \360\237\232\200 Lighthouse Scores

| Category      | example.com |
|---|:---:|
| Performance   | \360\237\237\241 72       |
| Seo           | \360\237\237\242 91       |
| Accessibility | \360\237\237\242 94       |
| Best Practices| \360\237\237\241 83       |

## \342\232\240\357\270\217  Issues Summary \342\200\224 5 total

1. Meta description too long (178 chars, ideal \342\211\244160)
2. No canonical <link> tag \342\200\224 duplicate content risk
3. 3/12 images missing alt attribute (accessibility + SEO fail)
4. No schema.org JSON-LD found \342\200\224 missing rich-result eligibility
5. robots.txt lacks a Sitemap: directive

Setup

1. Python dependencies

cd skills/seo-audit-engine
pip install requests beautifulsoup4 pyyaml

# PDF output (optional \342\200\224 falls back to markdown if unavailable):
pip install weasyprint
# Ubuntu/Debian system dep for weasyprint:
sudo apt install -y libpango-1.0-0 libpangoft2-1.0-0
# macOS:
brew install pango

2. Node.js / Lighthouse

# Verify Node 18+:
node --version

# Option A \342\200\224 install globally (recommended for repeated use):
npm install -g lighthouse
lighthouse --version

# Option B \342\200\224 let npx auto-download on first run (skill handles this):
# No action needed; npx --yes lighthouse is called automatically.

3. (Optional) config.yaml

cp config.yaml.example config.yaml
# Edit audit.crawl_timeout, audit.max_links_to_check, etc.

4. Run

# Basic audit
python skill.py --url https://example.com

# With competitor comparison
python skill.py --url https://example.com --competitor https://iana.org

# Crawl-only (no Lighthouse, no Chrome needed)
python skill.py --url https://example.com --skip-lighthouse

# Force markdown output
python skill.py --url https://example.com --format markdown

# Custom Lighthouse timeout
python skill.py --url https://example.com --timeout 180

License

MIT. See LICENSE at the repo root.

Want it run for you?

WiseChef Framework ships managed AI employees that run SEO Audit Engine on a schedule, alert on regressions, and deliver the PDF to Slack or email. From \342\202\254199/month.

SEO Audit Engine

SEO Audit Engine

When to use this skill

When NOT to use this skill

Pitfalls

Connector status

Output sample

Setup

1. Python dependencies

2. Node.js / Lighthouse

3. (Optional) config.yaml

4. Run

License

Want it run for you?

Works well with

Whitelabel Dashboard

Proposal Builder

Cold Outreach

Web Scraper Pro

Client Reporter

Help us price this right.