Configuration¶
config
¶
Configuration system.
YAML configuration files validated by Pydantic models. Provides type-safe configuration with validation, sensible defaults, and clear error messages. Main function: load_config().
Classes¶
PathPattern
¶
Bases: BaseModel
URL pattern matching configuration.
Supports three pattern types: - regex: Regular expression matching - glob: Shell-style glob patterns (e.g., "*.html") - prefix: Simple prefix matching (e.g., "/docs/")
Functions¶
matches
¶
Check if URL path matches this pattern.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
URL path component (e.g., "/docs/guide/") |
required |
Returns:
| Type | Description |
|---|---|
bool
|
True if path matches the pattern |
Examples:
SiteConfig
¶
Bases: BaseModel
Website configuration for crawling.
CrawlingRules
¶
Bases: BaseModel
Crawling behavior configuration.
Controls which URLs to follow, concurrency limits, rate limiting, and retry behavior.
PathMappingConfig
¶
Bases: BaseModel
URL to file path mapping configuration.
MarkdownConfig
¶
Bases: BaseModel
Markdown conversion options.
OutputConfig
¶
Bases: BaseModel
Output directory structure configuration.
AssetConfig
¶
Bases: BaseModel
Asset download configuration.
SusConfig
¶
Bases: BaseModel
Main configuration model for SUS scraper.
This is the root configuration object that contains all settings for a scraping project.
Functions¶
load_config
¶
Load and validate YAML configuration file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
Path
|
Path to YAML configuration file |
required |
Returns:
| Type | Description |
|---|---|
SusConfig
|
Validated SusConfig instance |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If config file is not found, invalid YAML, or validation fails |