Google Apps Script – extract text from PDF file – part 012.

By | 14/10/2018

This tutorial has two parts.
The first part refers to a book about javascript in different formats.
The second part is about processing PDF files with Google Apps Script.
Let’s start with the first part.
The book is at this web address.
The author comes with this brief introduction:
Eloquent JavaScript
3rd edition
This is a book about JavaScript, programming, and the wonders of the digital. You can read it online here, or get your own paperback copy of the second edition. A paper third edition is expected to be available this November.

Licensed under a Creative Commons attribution-noncommercial license. All code in this book may also be considered licensed under an MIT license.

Let’s move on to the second part of the tutorial.
You need to create a document in Google Drive and start the editor from the Tools – Script editor.
In the script editor, you must enable Drive API v2 from Resources – Advanced Google services.
Add the source code of the link to the pdf book we spoke about.

If you run this script you will have a new document with the text in the PDF book.
The source code running, but I got an error like this: Empty response (line 11, file “Code”).
I looked in google drive and found the document created by the script named Eloquent_JavaScript_small.
The size of the PDF file is 698 pages and the created document is 859 pages.
The last page from the document is page 682.
This is probably a PDF document encryption problem. I tried with other PDF documents but I did not have this problem.
The duration of extracting the text and creating the document with the text included is maximum 10 seconds. It is very fast for a conversion of 698 pages.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.