Converter¶
converter
¶
HTML to Markdown conversion.
Converts HTML documentation to Markdown with YAML frontmatter. Provides SusMarkdownConverter (custom markdownify with alt text preservation) and ContentConverter (high-level orchestrator).
Classes¶
SusMarkdownConverter
¶
Bases: MarkdownConverter
Custom Markdown converter with better handling for docs.
Overrides specific conversion methods for improved output quality.
Functions¶
convert_img
¶
Override image conversion for better alt text handling.
Preserves alt text when present; uses empty string when absent to avoid None formatting issues in markdown output.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
el
|
Any
|
HTML image element |
required |
text
|
str
|
Converted text content (unused for images) |
required |
**kwargs
|
Any
|
Additional arguments from parent class (e.g., convert_as_inline) |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
Markdown image syntax: |
Examples:
→
→ ![]()
convert_pre
¶
Override code block conversion with language detection.
Detects language from class attribute (e.g., class="language-python") and formats as fenced code blocks with language specifier.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
el
|
Any
|
HTML pre element |
required |
text
|
str
|
Converted text content |
required |
**kwargs
|
Any
|
Additional arguments from parent class (e.g., parent_tags) |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
Markdown fenced code block with language |
Examples:
print("hello")
→ python
→ print("hello")
→
plain text
→ → plain text
→
ContentConverter
¶
Converts HTML to Markdown with frontmatter.
Handles HTML cleaning, markdown conversion, frontmatter generation, and markdown post-processing.
Attributes:
| Name | Type | Description |
|---|---|---|
config |
MarkdownConfig containing conversion options |
|
converter |
SusMarkdownConverter instance for HTML→Markdown conversion |
Initialize converter with markdown config.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
MarkdownConfig
|
MarkdownConfig from SusConfig |
required |
Functions¶
convert
¶
convert(html: str, url: str, title: str | None = None, metadata: dict[str, Any] | None = None) -> str
Convert HTML to Markdown with frontmatter.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
html
|
str
|
HTML content to convert |
required |
url
|
str
|
Source URL (for frontmatter) |
required |
title
|
str | None
|
Page title (extracted from |
None
|
metadata
|
dict[str, Any] | None
|
Additional metadata for frontmatter |
None
|
Returns:
| Type | Description |
|---|---|
str
|
Markdown content with YAML frontmatter |
Examples:
>>> converter = ContentConverter(MarkdownConfig())
>>> html = '<html><head><title>Test</title></head><body><h1>Hello</h1></body></html>'
>>> result = converter.convert(html, "https://example.com/test")
>>> "# Hello" in result
True
>>> "title: Test" in result
True
Steps: 1. Extract title from HTML if not provided (from