Python HTML to PDF with xhtml2pdf: Step-by-Step Guide

This guide walks through building a Python HTML to PDF report using xhtml2pdf, Jinja2 templates, and matplotlib charts. By the end, you will have a working script that reads sales data from a JSON file, renders it into styled HTML, and converts it to a PDF ready for distribution or printing.
The Libraries: xhtml2pdf and Jinja2
xhtml2pdf (formerly known as pisa) is a Python library built on the ReportLab Toolkit. It converts HTML and CSS into PDF documents without a headless browser. It supports HTML5 and CSS 2.1 (with some CSS3), which works well for reports, invoices, and similar layouts. Complex CSS like flexbox or grid is not supported.
Jinja2 is a templating engine for Python. In this tutorial it generates the HTML that xhtml2pdf then converts to PDF. Variables like {{ total_sales }} and loops like {% for month in months %} keep the template reusable across different data sets.
xhtml2pdf also appears in the top Python HTML to PDF libraries comparison. For more on templating engines, see the HTML template engines guide.
From HTML to PDF: Step-by-Step with xhtml2pdf
Prerequisites
-
Python
- Ensure you have Python installed on your system.
- You can download the latest version from Python.org.
-
Code Editor
- Choose your preferred code editor.
- Popular options include Visual Studio Code or PyCharm.
Setting Up the Environment
Start by setting up the project environment and installing the required libraries.
Project Folder Structure
The project uses the following file structure:
html-to-pdf-project/ # Root directory
├── data/ # Data files
│ └── annual_data.json # Example data file for report
├── image/ # Image assets
│ └── logo.png
├── templates/ # Jinja2 HTML templates
│ ├── annual_report.html # Report template
│ └── styles.css # CSS styles for reports
├── utils/
│ └── chart_generator.py # Module for creating charts
└── generate_annual_report.py # Main script
Installing Required Libraries
Install the libraries used for Python PDF generation in this project:
pip install xhtml2pdf jinja2 matplotlib
xhtml2pdf: Converts HTML/CSS to PDF.jinja2: Creates dynamic HTML templates.matplotlib: Produces charts and visualizations.
Creating the Jinja2 HTML Template
The HTML template defines the report layout. Jinja2 syntax injects data into the predefined structure at render time.
View code – annual_report.html
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>{{ company_name }} - {{ title }} - {{ year }}</title>
<style>
{% include 'styles.css' %}
</style>
</head>
<body>
<div class="header">
<div class="logo-container">
<img src="{{ logo_path }}" class="company-logo" alt="Company Logo">
</div>
<h1>{{ title }} - {{ year }}</h1>
</div>
<div>
<h3>Executive Summary</h3>
<p class="summary-section">
This report presents the sales performance for the fiscal year {{ year }}.
Total sales reached ${{ '{:,.2f}'.format(total_sales) }}, representing a
{{ '{:.1f}'.format(growth_vs_prev_year) }}% growth compared to the previous year.
</p>
</div>
<div>
<h3>Annual Sales Performance</h3>
<div class="chart">
<img src="{{ sales_chart_path }}" alt="Monthly Sales Chart">
<p class="caption">Fig 1: Monthly Sales for {{ year }}</p>
</div>
<div class="chart">
<img src="{{ quarterly_chart_path }}" alt="Quarterly Sales Chart" style="width: 500px;">
<p class="caption">Fig 2: Quarterly Sales Breakdown</p>
</div>
<h3>Product Category Breakdown</h3>
<div class="chart">
<img src="{{ product_chart_path }}" alt="Product Sales Chart" style="width: 500px;">
<p class="caption">Fig 3: Sales by Product Category</p>
</div>
</div>
<div class="data-section">
<h3>Monthly Sales Data</h3>
<table>
<thead>
<tr>
<th>Month</th>
<th>Sales ($)</th>
<th>Orders</th>
<th>Avg. Order Value</th>
<th>Month-over-Month Growth (%)</th>
</tr>
</thead>
<tbody>
{% for month in months %}
<tr>
<td>{{ month }}</td>
<td>${{ '{:,.2f}'.format(monthly_sales[month]) }}</td>
<td>{{ '{:,}'.format(monthly_orders[month]) }}</td>
<td>${{ '{:.2f}'.format(monthly_sales[month] / monthly_orders[month]) }}</td>
<td class="{{ growth_classes[month] }}">{{ growth_values[month]|safe }}</td>
</tr>
{% endfor %}
</tbody>
</table>
</div>
<div>
<h3>Conclusion</h3>
<p class="summary-section">{{ final_summary }}</p>
</div>
</body>
</html>
The template uses Jinja2 syntax ({{ variable }} and {% for %} blocks) to insert data dynamically. One template can generate different reports depending on the input data.
Styling with CSS
The CSS file defines the visual appearance of the report.
View code – styles.css
* {
box-sizing: border-box;
}
h1, h2, h3, div, section {
border: none !important;
}
body {
font-family: Helvetica, Arial, sans-serif;
margin: 0;
padding: 20px;
color: #333;
}
.header {
text-align: center;
margin-bottom: 30px;
padding-bottom: 10px;
}
.logo-container {
display: flex;
align-items: center;
justify-content: center;
margin-bottom: 15px;
}
.company-logo {
height: 150px;
}
h1 {
font-size: 32px;
color: #663399;
}
h3 {
margin-bottom: 15px;
font-size: 18px;
padding-bottom: 5px;
background-color: #f7f7f7;
border-radius: 10px;
}
.summary-section {
font-size: 14px;
}
.chart {
margin-bottom: 30px;
background-color: white;
padding: 10px;
border: 1px solid #ddd;
border-radius: 5px;
text-align: center;
}
.caption {
text-align: center;
font-size: 12px;
color: #555;
margin-top: 5px;
}
table {
width: 100%;
border-collapse: collapse;
margin: 15px 0;
}
th {
background-color: #582888;
color: white;
padding: 10px 5px 5px;
text-align: center;
font-size: 14px;
}
tr:nth-child(even) {
background-color: #f2f2f2;
}
td {
padding: 10px 5px 5px;
font-size: 14px;
border-bottom: 1px solid #ddd;
}
.positive-growth {
color: #28a745;
font-weight: bold;
}
.negative-growth {
color: #dc3545;
font-weight: bold;
}
Creating the Data Source
The report reads its data from a JSON file. This separates data from presentation, so generating a different report only requires changing the input file.
View example data – annual_data.json
{
"title": "Annual Sales Report",
"company_name": "Lorem Ipsum Company",
"year": 2024,
"total_sales": 3875200.50,
"growth_vs_prev_year": 8.7,
"generation_date": "auto",
"monthly_sales": {
"January": 265120.75,
"February": 278350.45,
"March": 303779.30,
"April": 312540.25,
"May": 325680.50,
"June": 342750.80,
"July": 328950.40,
"August": 318760.30,
"September": 337820.65,
"October": 346950.75,
"November": 348621.35,
"December": 365875.00
},
"monthly_orders": {
"January": 1854,
"February": 1932,
"March": 2102,
"April": 2185,
"May": 2253,
"June": 2376,
"July": 2215,
"August": 2104,
"September": 2267,
"October": 2398,
"November": 2425,
"December": 2576
},
"category_sales": {
"Electronics": 1254300.25,
"Clothing": 886700.50,
"Home & Kitchen": 724800.75,
"Sports": 568200.00,
"Books": 356250.00,
"Toys & Games": 285000.00
},
"quarterly_breakdown": {
"Q1": 847250.50,
"Q2": 980971.55,
"Q3": 985531.35,
"Q4": 1061447.10
},
"final_summary": "The Annual Sales Report for 2024 demonstrates robust growth across multiple metrics. Monthly trends indicate strong peaks and sustained improvements, while the quarterly and category breakdowns highlight key areas of strength. Overall, this report provides a comprehensive view of our successful year and lays a strong foundation for future strategies. Moving forward, the company can leverage these insights to refine sales strategies, optimize product offerings, and enhance customer engagement, thereby laying a solid foundation for sustained growth in the upcoming fiscal years."
}
Creating Data Visualizations
The chart generator module creates three matplotlib visualizations:
- A monthly sales bar chart.
- A quarterly sales breakdown chart.
- A product category pie chart.
Each chart is rendered as a base64-encoded PNG image that can be embedded directly in the HTML.
View code – chart_generator.py
import base64
import io
import matplotlib.pyplot as plt
from matplotlib.ticker import FuncFormatter
def remove_spines(ax):
# Remove axis borders for a clean look
for spine in ax.spines.values():
spine.set_visible(False)
def set_thousands_formatter(ax):
# Format y-axis ticks to display values in thousands
def thousands_formatter(x, pos):
return f'${x / 1000:.0f}k'
ax.yaxis.set_major_formatter(FuncFormatter(thousands_formatter))
def fig_to_base64(fig):
# Convert the Matplotlib figure to a Base64-encoded PNG string
buffer = io.BytesIO()
fig.savefig(buffer, format='png', dpi=300, bbox_inches='tight', transparent=True)
plt.close(fig)
buffer.seek(0)
encoded = base64.b64encode(buffer.getvalue()).decode('utf-8')
buffer.close()
return f"data:image/png;base64,{encoded}"
def create_sales_chart(sales_data, year):
# Create a bar chart for monthly sales data
months = list(sales_data['monthly_sales'].keys())
values = list(sales_data['monthly_sales'].values())
fig, ax = plt.subplots(figsize=(10, 5))
bars = ax.bar(months, values, color='#AB8DC1')
set_thousands_formatter(ax)
# Annotate each bar with its sales value
for i, bar in enumerate(bars):
height = bar.get_height()
ax.text(
bar.get_x() + bar.get_width() / 2., height + 5000,
f'${values[i] / 1000:.0f}k', ha='center', va='bottom', fontsize=11
)
remove_spines(ax) # Clean up the chart appearance
plt.xticks(rotation=45) # Rotate x-axis labels for readability
plt.ylabel('Sales')
plt.title(f'Monthly Sales for {year}', fontsize=14)
plt.tight_layout()
return fig_to_base64(fig) # Return the chart image as a Base64 string
def create_quarterly_chart(sales_data):
# Create a bar chart for quarterly sales data
quarters = list(sales_data['quarterly_breakdown'].keys())
values = list(sales_data['quarterly_breakdown'].values())
fig, ax = plt.subplots(figsize=(8, 5))
bars = ax.bar(quarters, values, color='#795473')
set_thousands_formatter(ax)
plt.ylabel('Sales')
# Annotate each bar with its quarterly sales value
for i, bar in enumerate(bars):
height = bar.get_height()
ax.text(
bar.get_x() + bar.get_width() / 2., height + 20000,
f'${values[i] / 1000:.0f}k', ha='center', va='bottom', fontsize=11
)
remove_spines(ax)
plt.title('Quarterly Sales Breakdown', fontsize=14)
plt.tight_layout()
return fig_to_base64(fig)
def create_product_breakdown_chart(sales_data):
# Create a pie chart for product category sales breakdown
categories = list(sales_data['category_sales'].keys())
values = list(sales_data['category_sales'].values())
fig, ax = plt.subplots(figsize=(9, 8))
# Generate the pie chart with percentage labels
wedges, texts, autotexts = ax.pie(
values, labels=None, autopct='%1.1f%%',
colors=['#F0C7E5', '#E296D6', '#E4B4E4', '#D7A5EC', '#AB8DC1', '#9d70d5']
)
for autotext in autotexts:
autotext.set_fontsize(12)
autotext.set_weight('bold')
# Create a legend with sales values formatted in thousands
legend_labels = [f'{cat}: ${val / 1000:.0f}k' for cat, val in zip(categories, values)]
ax.legend(legend_labels, loc='center left', bbox_to_anchor=(1.05, 0.5), frameon=False, fontsize=13)
ax.axis('equal') # Ensure the pie chart is circular
plt.title('Product Category Breakdown', fontsize=14)
plt.tight_layout()
return fig_to_base64(fig)
The Main Script
The main script ties everything together:
- Loading data from the JSON file.
- Generating charts with matplotlib.
- Preparing template data with calculated values.
- Rendering the HTML template with Jinja2.
- Converting the rendered HTML to PDF using xhtml2pdf.
- Saving the final PDF report with a timestamp.
View code – generate_annual_report.py
import os
import sys
import json
import datetime
import base64
from xhtml2pdf import pisa
from utils.chart_generator import create_sales_chart, create_product_breakdown_chart, create_quarterly_chart
from jinja2 import Environment, FileSystemLoader
# Added PDF conversion function
def convert_html_to_pdf(html_content, output_filename):
# Open the output file in write mode
with open(output_filename, "wb") as output_file:
# Convert HTML to PDF
conversion_status = pisa.CreatePDF(
html_content, # HTML content string
dest=output_file # Output file handle
)
# Return True if successful
return conversion_status.err == 0
def load_data(json_file_path):
# Check if file exists before attempting to read it
if not os.path.exists(json_file_path):
raise FileNotFoundError(f"Data file not found: {json_file_path}")
# Load JSON data from file
with open(json_file_path, 'r') as file:
data = json.load(file)
# Auto-set generation date if marked as 'auto'
if data.get('generation_date') == 'auto':
data['generation_date'] = datetime.datetime.now().strftime('%Y-%m-%d')
return data
def prepare_logo_path(logo_file_path):
# Check if logo file exists
if not os.path.exists(logo_file_path):
print(f"Warning: Logo file not found: {logo_file_path}")
return None
# Get file extension to determine MIME type
_, ext = os.path.splitext(logo_file_path)
ext = ext.lower().replace('.', '')
# Map file extensions to MIME types
mime_types = {
'png': 'image/png',
'jpg': 'image/jpeg',
'jpeg': 'image/jpeg',
'gif': 'image/gif',
'svg': 'image/svg+xml'
}
mime_type = mime_types.get(ext, 'image/png')
# Convert logo to base64 format for embedding in HTML
with open(logo_file_path, 'rb') as f:
logo_data = f.read()
encoded = base64.b64encode(logo_data).decode('utf-8')
return f"data:{mime_type};base64,{encoded}"
def prepare_template_data(sales_data, year):
# Get list of months from sales data
months = list(sales_data.get('monthly_sales', {}).keys())
# Initialize dictionaries for growth tracking
growth_classes = {}
growth_values = {}
# Calculate month-over-month growth for each month
previous_sales = None
for month in months:
sales = sales_data['monthly_sales'][month]
# Set default values and calculate growth percentage if previous month exists
growth_values[month] = "N/A"
growth_classes[month] = ""
if previous_sales is not None and previous_sales > 0:
growth_val = ((sales - previous_sales) / previous_sales) * 100
growth_icon = "▲" if growth_val > 0 else "▼"
growth_classes[month] = "positive-growth" if growth_val > 0 else "negative-growth"
growth_values[month] = f"{growth_icon} {abs(growth_val):.1f}%"
previous_sales = sales
# Return prepared data with defaults for missing keys
return {
'title': sales_data.get('title', f'Annual Sales Report - {year}'),
'company_name': sales_data.get('company_name', 'Company Name'),
'year': year,
'total_sales': sales_data.get('total_sales', 0),
'growth_vs_prev_year': sales_data.get('growth_vs_prev_year', 0),
'generation_date': sales_data.get('generation_date', datetime.datetime.now().strftime('%Y-%m-%d')),
'months': months,
'monthly_sales': sales_data.get('monthly_sales', {}),
'monthly_orders': sales_data.get('monthly_orders', {}),
'quarterly_breakdown': sales_data.get('quarterly_breakdown', {}),
'growth_classes': growth_classes,
'growth_values': growth_values,
'final_summary': sales_data.get('final_summary', ''),
}
def generate_html_from_template(template_data, sales_chart_path, quarterly_chart_path,
product_chart_path, logo_path=None, template_name='annual_report.html'):
# Set up Jinja2 environment for template rendering
template_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'templates')
env = Environment(loader=FileSystemLoader(template_dir))
# Try loading the specified template or fallback to default
try:
template = env.get_template(template_name)
except Exception as e:
print(f"Error loading template '{template_name}': {str(e)}")
template = env.get_template('annual_report.html')
# Add chart paths to the template data
template_data['sales_chart_path'] = sales_chart_path
template_data['quarterly_chart_path'] = quarterly_chart_path
template_data['product_chart_path'] = product_chart_path
# Add logo path if provided, otherwise set empty placeholder
if logo_path:
template_data['logo_path'] = logo_path
else:
template_data['logo_path'] = ''
# Render the template with the provided data
return template.render(**template_data)
def verify_data(sales_data):
# Define the keys that must be present for the report to work correctly
required_keys = ['monthly_sales', 'monthly_orders', 'quarterly_breakdown', 'category_sales']
missing_keys = []
# Check for missing required keys
for key in required_keys:
if key not in sales_data:
missing_keys.append(key)
# Warn if required keys are missing
if missing_keys:
print(f"Warning: Required data sections missing: {', '.join(missing_keys)}")
print("Report may be incomplete or charts may not render correctly.")
return False
return True
def generate_annual_report(sales_data, year, output_filename, logo_file=None, template_name='annual_report.html'):
# Check data integrity before proceeding
verify_data(sales_data)
try:
# Generate all chart images needed for the report
sales_chart_path = create_sales_chart(sales_data, year)
quarterly_chart_path = create_quarterly_chart(sales_data)
product_chart_path = create_product_breakdown_chart(sales_data)
# Prepare logo data if a logo file was provided
logo_path = None
if logo_file:
logo_path = prepare_logo_path(logo_file)
# Prepare all template data with calculations for growth, etc.
template_data = prepare_template_data(sales_data, year)
# Generate complete HTML from template and all data
html_content = generate_html_from_template(
template_data,
sales_chart_path,
quarterly_chart_path,
product_chart_path,
logo_path,
template_name
)
# Convert HTML to PDF and save to file
success = convert_html_to_pdf(html_content, output_filename)
# Report success or failure
if success:
print(f"Annual report successfully generated: {output_filename}")
else:
print("Error generating PDF annual report")
except Exception as e:
# Catch and report any errors that occur during report generation
print(f"Error generating report: {str(e)}")
raise
if __name__ == "__main__":
# Parse command line arguments with defaults
year = int(sys.argv[1]) if len(sys.argv) > 1 else 2024
output_file = sys.argv[2] if len(sys.argv) > 2 else 'annual_sales_report.pdf'
data_file = sys.argv[3] if len(sys.argv) > 3 else 'data/annual_data.json'
logo_file = sys.argv[4] if len(sys.argv) > 4 else 'image/logo.png'
template_name = sys.argv[5] if len(sys.argv) > 5 else 'annual_report.html'
# Load data from JSON file or exit if failed
try:
sales_data = load_data(data_file)
except Exception as e:
print(f"Error loading data: {str(e)}")
sys.exit(1)
# Add timestamp to the output filename
timestamp = datetime.datetime.now().strftime('%Y%m%d_%H%M%S')
filename, file_extension = os.path.splitext(output_file)
output_file = f"{filename}_{timestamp}{file_extension}"
# Create output directory if it doesn't exist
os.makedirs('output', exist_ok=True)
output_path = os.path.join('output', output_file)
# Generate the report and handle any errors
try:
generate_annual_report(sales_data, year, output_path, logo_file, template_name)
except Exception as e:
print(f"Failed to generate report: {str(e)}")
sys.exit(1)
Running the Script
With all files in place, run the script to generate the PDF report.
python generate_annual_report.py
Exploring the Generated Report
After running the script:
- Check the
output/directory. - Open the generated PDF file.
- Click to view the generated report and explore the dynamically created sales report!
Troubleshooting and Best Practices
Common Issues
When working with xhtml2pdf, you might run into a few common problems:
- Missing images: Check that all paths are correct and images are encoded properly.
- CSS styling problems: Not all CSS properties are supported by xhtml2pdf. Stick to basic properties.
- Font issues: Use web-safe fonts or embed custom fonts properly.
- Page breaks: Control page breaks with CSS directives like
page-break-before. - Unicode characters: Ensure your HTML has proper charset declarations.
Performance Optimization
For large reports, consider these optimizations:
- Compress images before embedding them.
- Split very large reports into multiple PDFs.
- Use web-safe fonts to avoid embedding large font files.
- Cache generated charts if they're used in multiple reports.
Alternative: HTML to PDF API
xhtml2pdf works well for standard layouts, but it does not support modern CSS (flexbox, grid) or JavaScript. If your reports need full browser rendering or you want to skip managing local dependencies, a PDF generation API like PDFBolt is an option.
Using Templates for PDF Generation
Instead of managing HTML templates locally, you can design reports in a visual editor and generate PDFs through API calls:
import requests
import json
url = "https://api.pdfbolt.com/v1/direct"
headers = {
"API-KEY": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX",
"Content-Type": "application/json"
}
data_json = '''{
"templateId": "your-template-id",
"templateData": {
"title": "Annual Sales Report",
"company_name": "Lorem Ipsum Company",
"year": "2024",
"total_sales": "3875200.50",
"final_summary": "Total sales for 2024 reached $3,875,200.50, an 8.7% increase over the prior year."
}
}'''
data = json.loads(data_json)
try:
response = requests.post(url, headers=headers, json=data)
response.raise_for_status()
with open('report_pdfbolt.pdf', 'wb') as f:
f.write(response.content)
print("PDF generated successfully")
except requests.exceptions.HTTPError as e:
print(f"HTTP {response.status_code}")
print(f"Error Message: {response.text}")
except requests.exceptions.RequestException as e:
print(f"Error: {e}")
- API Documentation – reference for all endpoints and parameters.
- Template Management Guide – creating and customizing templates.
- Python Quick Start Guide – code samples for Python integration.
xhtml2pdf uses its own CSS engine and does not run JavaScript. PDFBolt HTML to PDF API uses headless Chrome, so pages render exactly as they do in a browser.
Conclusion
This tutorial covered the full pipeline for Python HTML to PDF generation with xhtml2pdf: loading data from JSON, generating matplotlib charts, rendering a Jinja2 template, and converting the result to a PDF file. The same approach works for invoices, certificates, or any document where the layout stays the same but the data changes.
xhtml2pdf handles standard CSS well and runs without a headless browser, which keeps dependencies minimal. For layouts that need modern CSS or JavaScript, the PDFBolt API is an alternative that uses headless Chrome.
Happy report-making, and may your PDF reports be as engaging as a Netflix series! 📺
