How to Convert PDF to Docx in Python

In this article, we are going to learn how to convert PDF files to Docx word files in Python.

How to Convert PDF to Docx in Python
How to Convert PDF to Docx in Python

In this tutorial, we are going to use pywin32 python library to convert a PDF file into DOCX(Microsoft Word).

The main goal is to develop a command line lightweight python based script to convert a PDF file to DOCX without using any external source. 

Let us install pywin32 python package by running the below script in the command line,

python -m pip install pywin32

Let's start by importing the modules:

import win32com.client
import os

Let us define the input and output file path,

# Input and Output path
pdf_path = r"""C:\sample.pdf"""
output_path = r"""C:\output_folder"""

let us convert pdf to docx and save it on the output path with the same input file name,

wb = word.Documents.Open(in_file)
out_file = os.path.abspath(output_path + '\\' + filename[0:-4] + ".docx")
wb.SaveAs2(out_file, FileFormat=16)
wb.Close()
word.Quit()

Full code,

import win32com.client
import os

# Input and Output path
pdf_path = r"""C:\sample.pdf"""
output_path = r"""C:\output_folder"""

word = win32com.client.Dispatch("Word.Application")
word.visible = 0  # CHANGE TO 1 IF YOU WANT TO SEE WORD APPLICATION RUNNING AND ALL MESSAGES OR WARNINGS SHOWN BY WORD

# GET FILE NAME AND NORMALIZED PATH
filename = pdf_path.split('\\')[-1]
in_file = os.path.abspath(pdf_path)

# convert pdf to docx and save it on the output path with the same input file name
wb = word.Documents.Open(in_file)
out_file = os.path.abspath(output_path + '\\' + filename[0:-4] + ".docx")
wb.SaveAs2(out_file, FileFormat=16)
wb.Close()
word.Quit()

To run the code, open your command prompt and navigate to the folder where you saved the code,

to run the code use,

python code_file_name.py

now in the output folder location with pdf file name a new docx file will be generated.