How to Convert a DOC File to PDF Using Python
Converting Microsoft Word documents (DOC or DOCX) to PDF is a common requirement in business and automation workflows. Python makes this task simple and efficient with the right libraries. In this guide, we'll explore the most popular Python module for DOC-to-PDF conversion and provide a step-by-step implementation.
Why Convert DOC to PDF?
PDFs are widely used because they preserve formatting across devices and operating systems. Converting a DOC file to PDF ensures:
- Consistent document appearance for all users
- Better security and restricted editing
- Smaller file sizes compared to DOC files
- Easy sharing and printing
The Best Python Module for DOC-to-PDF Conversion
The most popular and reliable Python library for this task is docx2pdf
. It provides a simple interface to convert Word documents to PDF while maintaining formatting, images, and other elements.
Key Features of docx2pdf
- Supports both DOC and DOCX file formats
- Preserves document formatting, tables, and images
- Works on Windows, macOS, and Linux
- Simple API with just a few lines of code
Installation
First, install the docx2pdf
package using pip:
pip install docx2pdf
Note: This library requires Microsoft Word to be installed on your system as it uses Word's built-in conversion capabilities.
Converting a Single DOC File to PDF
Here's a simple script to convert a single Word document:
from docx2pdf import convert
# Convert a single file
convert("input.docx", "output.pdf")
print("Conversion successful!")
Explanation:
- Import the
convert
function fromdocx2pdf
- Call
convert()
with input and output file paths - The output PDF will be created in the same directory unless specified otherwise
Converting Multiple DOC Files in a Folder
To batch convert all Word documents in a folder:
from docx2pdf import convert
import os
# Folder containing DOC files
input_folder = "doc_files"
output_folder = "pdf_files"
# Create output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)
# Convert all DOCX files in the folder
for file in os.listdir(input_folder):
if file.endswith(".docx") or file.endswith(".doc"):
input_path = os.path.join(input_folder, file)
output_path = os.path.join(output_folder, file.replace(".docx", ".pdf").replace(".doc", ".pdf"))
convert(input_path, output_path)
print("Batch conversion completed!")
Alternative Method: Using python-docx and reportlab
For more control over the conversion process, you can combine python-docx
to read Word documents and reportlab
to generate PDFs:
from docx import Document
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter
def docx_to_pdf(docx_path, pdf_path):
doc = Document(docx_path)
c = canvas.Canvas(pdf_path, pagesize=letter)
y = 750 # Starting y position
for para in doc.paragraphs:
c.drawString(50, y, para.text)
y -= 15 # Move to next line
c.save()
# Usage
docx_to_pdf("document.docx", "output.pdf")
Note: This method gives you more control but requires manual handling of complex formatting.
- How to convert Word document to PDF using Python
- Best Python library for DOC to PDF conversion
- Convert DOCX to PDF programmatically in Python
- Batch convert Word files to PDF with Python
- Python script for DOC to PDF conversion
- How to automate Word to PDF conversion in Python
- docx2pdf Python example code
- Convert multiple DOC files to PDF in a folder
- Alternative methods for Word to PDF conversion Python
- How to preserve formatting when converting DOC to PDF
- Python solution for DOCX to PDF without losing quality
- Compare docx2pdf vs python-docx for PDF conversion
No comments:
Post a Comment