Skip to content

Configuration

config

Configuration system.

YAML configuration files validated by Pydantic models. Provides type-safe configuration with validation, sensible defaults, and clear error messages. Main function: load_config().

Classes

PathPattern

Bases: BaseModel

URL pattern matching configuration.

Supports three pattern types: - regex: Regular expression matching - glob: Shell-style glob patterns (e.g., "*.html") - prefix: Simple prefix matching (e.g., "/docs/")

Functions

matches
matches(path: str) -> bool

Check if URL path matches this pattern.

Parameters:

Name Type Description Default
path str

URL path component (e.g., "/docs/guide/")

required

Returns:

Type Description
bool

True if path matches the pattern

Examples:

>>> pattern = PathPattern(pattern="/docs/", type="prefix")
>>> pattern.matches("/docs/guide/")
True
>>> pattern.matches("/blog/")
False
>>> pattern = PathPattern(pattern="*.html", type="glob")
>>> pattern.matches("/page.html")
True
>>> pattern = PathPattern(pattern=r"^/api/v\d+/", type="regex")
>>> pattern.matches("/api/v2/users")
True

SiteConfig

Bases: BaseModel

Website configuration for crawling.

CrawlingRules

Bases: BaseModel

Crawling behavior configuration.

Controls which URLs to follow, concurrency limits, rate limiting, and retry behavior.

PathMappingConfig

Bases: BaseModel

URL to file path mapping configuration.

MarkdownConfig

Bases: BaseModel

Markdown conversion options.

OutputConfig

Bases: BaseModel

Output directory structure configuration.

AssetConfig

Bases: BaseModel

Asset download configuration.

SusConfig

Bases: BaseModel

Main configuration model for SUS scraper.

This is the root configuration object that contains all settings for a scraping project.

Functions

validate_name classmethod
validate_name(v: str) -> str

Validate that name is a valid directory name.

The name must not contain path separators or special characters that would be invalid in directory names.

Functions

load_config

load_config(path: Path) -> SusConfig

Load and validate YAML configuration file.

Parameters:

Name Type Description Default
path Path

Path to YAML configuration file

required

Returns:

Type Description
SusConfig

Validated SusConfig instance

Raises:

Type Description
ConfigError

If config file is not found, invalid YAML, or validation fails