How to Convert a DOC File to PDF Using Python

How to Convert a DOC File to PDF Using Python

Converting Microsoft Word documents (DOC or DOCX) to PDF is a common requirement in business and automation workflows. Python makes this task simple and efficient with the right libraries. In this guide, we'll explore the most popular Python module for DOC-to-PDF conversion and provide a step-by-step implementation.


Why Convert DOC to PDF?

PDFs are widely used because they preserve formatting across devices and operating systems. Converting a DOC file to PDF ensures:

  • Consistent document appearance for all users
  • Better security and restricted editing
  • Smaller file sizes compared to DOC files
  • Easy sharing and printing

The Best Python Module for DOC-to-PDF Conversion

The most popular and reliable Python library for this task is docx2pdf. It provides a simple interface to convert Word documents to PDF while maintaining formatting, images, and other elements.

Key Features of docx2pdf

  • Supports both DOC and DOCX file formats
  • Preserves document formatting, tables, and images
  • Works on Windows, macOS, and Linux
  • Simple API with just a few lines of code

Installation

First, install the docx2pdf package using pip:

pip install docx2pdf

Note: This library requires Microsoft Word to be installed on your system as it uses Word's built-in conversion capabilities.


Converting a Single DOC File to PDF

Here's a simple script to convert a single Word document:

from docx2pdf import convert

# Convert a single file
convert("input.docx", "output.pdf")

print("Conversion successful!")

Explanation:

  1. Import the convert function from docx2pdf
  2. Call convert() with input and output file paths
  3. The output PDF will be created in the same directory unless specified otherwise

Converting Multiple DOC Files in a Folder

To batch convert all Word documents in a folder:

from docx2pdf import convert
import os

# Folder containing DOC files
input_folder = "doc_files"
output_folder = "pdf_files"

# Create output folder if it doesn't exist
os.makedirs(output_folder, exist_ok=True)

# Convert all DOCX files in the folder
for file in os.listdir(input_folder):
    if file.endswith(".docx") or file.endswith(".doc"):
        input_path = os.path.join(input_folder, file)
        output_path = os.path.join(output_folder, file.replace(".docx", ".pdf").replace(".doc", ".pdf"))
        convert(input_path, output_path)

print("Batch conversion completed!")

Alternative Method: Using python-docx and reportlab

For more control over the conversion process, you can combine python-docx to read Word documents and reportlab to generate PDFs:

from docx import Document
from reportlab.pdfgen import canvas
from reportlab.lib.pagesizes import letter

def docx_to_pdf(docx_path, pdf_path):
    doc = Document(docx_path)
    c = canvas.Canvas(pdf_path, pagesize=letter)
    
    y = 750  # Starting y position
    for para in doc.paragraphs:
        c.drawString(50, y, para.text)
        y -= 15  # Move to next line
    
    c.save()

# Usage
docx_to_pdf("document.docx", "output.pdf")

Note: This method gives you more control but requires manual handling of complex formatting.


Summary: Converting DOC to PDF in Python is straightforward with the docx2pdf library. For basic conversions, it's the most efficient solution. For advanced requirements, consider combining python-docx with PDF generation libraries.

Incoming search terms
- How to convert Word document to PDF using Python
- Best Python library for DOC to PDF conversion
- Convert DOCX to PDF programmatically in Python
- Batch convert Word files to PDF with Python
- Python script for DOC to PDF conversion
- How to automate Word to PDF conversion in Python
- docx2pdf Python example code
- Convert multiple DOC files to PDF in a folder
- Alternative methods for Word to PDF conversion Python
- How to preserve formatting when converting DOC to PDF
- Python solution for DOCX to PDF without losing quality
- Compare docx2pdf vs python-docx for PDF conversion

No comments:

Post a Comment