How to Convert DOCX to RTF Using Python

How to Convert DOCX to RTF Using Python?

Converting a .docx file to .rtf (Rich Text Format) is a common requirement for compatibility, archiving, or sharing purposes. Python makes this task simple with the right libraries. In this guide, we'll use the python-docx and pyth libraries to achieve this conversion efficiently.


Prerequisites

Before proceeding, ensure you have Python installed (preferably Python 3.6 or later). You'll also need to install the following libraries:

  • python-docx – For reading DOCX files.
  • pyth – For converting DOCX to RTF.

Install them using pip:

pip install python-docx pyth

Step-by-Step Conversion Process

1. Read the DOCX File

First, we'll use python-docx to extract text and formatting from the DOCX file.

from docx import Document

def read_docx(file_path):
    doc = Document(file_path)
    text = []
    for para in doc.paragraphs:
        text.append(para.text)
    return "\n".join(text)

2. Convert to RTF Using Pyth

Next, we'll use the pyth library to convert the extracted text into RTF format.

from pyth.plugins.rtf15.writer import Rtf15Writer
from pyth.plugins.plaintext.reader import PlaintextReader

def convert_to_rtf(text, output_path):
    document = PlaintextReader.read(text)
    rtf_content = Rtf15Writer.write(document).getvalue()
    
    with open(output_path, "wb") as rtf_file:
        rtf_file.write(rtf_content)

3. Combine Both Steps

Now, let's combine these functions to convert a DOCX file to RTF in one go.

def docx_to_rtf(docx_path, rtf_path):
    text = read_docx(docx_path)
    convert_to_rtf(text, rtf_path)
    print(f"Successfully converted {docx_path} to {rtf_path}")

Testing the Script

To test the script, save a sample DOCX file (e.g., sample.docx) and run:

docx_to_rtf("sample.docx", "output.rtf")

If successful, you'll find output.rtf in your working directory.


Alternative Method: Using LibreOffice CLI

If you prefer a system-level approach, you can use LibreOffice's command-line tool for conversion:

import subprocess

def convert_with_libreoffice(input_path, output_format="rtf"):
    subprocess.run(["libreoffice", "--headless", "--convert-to", output_format, input_path])

Summary: This guide explains how to convert DOCX to RTF using Python with python-docx and pyth. The process involves reading the DOCX content, converting it to RTF, and saving the output.

Incoming search terms
- How to convert DOCX to RTF using Python
- Best Python library for DOCX to RTF conversion
- Convert Word documents to RTF programmatically
- Python script to change DOCX to RTF format
- How to batch convert DOCX files to RTF in Python
- Extract text from DOCX and save as RTF in Python
- Using python-docx and pyth for RTF conversion
- DOCX to RTF converter script in Python
- How to automate DOCX to RTF conversion with Python
- Python code for converting Word files to Rich Text Format

No comments:

Post a Comment