Troubleshooting¶
Installation¶
Playwright¶
If playwright install chromium fails:
# Install playwright browser for news-watch
conda activate newswatch-env
playwright install chromium
# For system dependencies
playwright install-deps chromium
# For Docker/Linux environments
apt-get update && apt-get install -y \
libnss3 libatk-bridge2.0-0 libdrm2 libxcomposite1 \
libxdamage1 libxrandr2 libgbm1 libxss1 libasound2
Package install¶
If install/import fails:
# If uv is not available, fallback to pip
pip install news-watch
# Development setup (recommended)
git clone https://github.com/okkymabruri/news-watch.git
cd news-watch
uv sync --all-extras
uv run playwright install chromium
Runtime¶
No results¶
Quick checks:
Common causes:
- keywords too specific → try
ekonomi,bisnis,indonesia - date too old → try a recent date first
- blocked in cloud/Linux → try fewer scrapers or run locally
Timeout¶
Try:
Memory¶
For large runs, write to a file:
Platform notes¶
Linux / cloud¶
Some sites block server/cloud IPs more aggressively.
Try:
Data quality¶
Missing/truncated content¶
Causes:
- HTML structure changed
- paywall
- blocked
Check with verbose + single scraper:
newswatch --keywords ekonomi --start_date 2025-01-01 -v
newswatch --keywords ekonomi --start_date 2025-01-01 --scrapers kompas -v
newswatch --keywords ekonomi --start_date 2025-01-01 --scrapers detik -v
Duplicates¶
Normal when multiple sites cover the same story. Deduplicate in post-processing.
Encoding¶
If text has broken characters, try another source:
CLI¶
Command not found¶
If newswatch is not found:
Arguments¶
Check:
Tests¶
Running tests¶
Reporting bugs¶
Include:
- OS + Python version
- command you ran
- full error output
- one example URL if relevant