1 Reader
You can load a PDF
document into a JavaScript PDF
object from any InputStream convertible object (File, Blob, URL, base64 string)
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); console.log(pdf.getNumberOfPages()); </script>
2
You can use PdfDocument methods to interact with PDF
.
Return | Method | Description |
---|---|---|
Blob | toBlob | Returns the document as a java.sql.Blob |
Reader
|
join(Reader other) |
Joins current pdf with provided returning a new Reader
|
ArrayList<Reader > |
split(int size) | Splits the pdf into chunks of size pages |
Reader
|
slice(int start, int end) | Selects the pages starting at the given start argument, and ends at the given end argument. |
Reader
|
rotate(int page, double angle) | Rotates specific page by angle and returns the new document |
int | getNumberOfPages() | Returns the number of pages |
int | getNumberOfImages() | Returns the number of images |
String | getTextFormPage(int page) | Returns the text layer for given page |
ArrayList<String> | getTextFromDocument() | Returns an array of texts for each page in the document |
ArrayList<PdfImage> | getImages() | Returns an array of document images |
Reader
|
insertPage(int position) | Inserts a new blank page in position. Position 1 represents before any page in document. |
Blob | toBlob() | Returns PDF Document as Blob |
2 Splitting documents
The split
method breaks the PDF
document pages into documents of given size.
For example, to split a 12 month calendar in two pieces of 6 month each.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/calendar-2020.pdf')); // split in 6 page docs var docs = pdf.split(6); // Return as a resultset return new Ax.rs.Reader().build(docs); </script>
3 Slicing documents
The slice
method selects the pages starting at the given start argument, and ends at
the given end argument.
For example to extract months from calendar months October to December:
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/calendar-2020.pdf')); // split in 6 page docs return pdf.slice(10, 12); </script>
4 Joining documents
To join tow PDF
documents simply use join
method on source document.
<script> var pdf2019 = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/calendar-2019.pdf')); var pdf2020 = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/calendar-2020.pdf')); return pdf2019.join(pdf2020); </script>
5 Removing pages
To remove pages from a document simply specify the page numbers.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/calendar-2020.pdf')); // remove jan, march, july from calendar return pdf.remove(1, 3, 7); </script>
6 Text extraction
On a PDF
containing text, you can extract the text from a page by using getTextFromPage
method.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); console.log(pdf.getTextFromPage(2)) </script>
Simple PDF File 2
...continued from page 1. Yet more text. And more text. And more text.
And more text. And more text. And more text. And more text. And more
text. Oh, how boring typing this stuff. But not as boring as watching
paint dry. And more text. And more text. And more text. And more text.
Boring. More, a little more text. The end, and just as well.
7 Byte Code extraction
On a PDF
containing text, you can extract information as the text-position of a text from a page by using getBitmapFromPage
method.
This function return a string representing a JSON multidimensional array.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); let str = pdf.getBitmapFromPage(2); console.log("========JSON DATA FROM TEXT IN PAGE =========="); console.log(str); console.log("=============================================="); let arrPdfByte = eval(str); arrPdfByte.forEach(chunk => { console.log("******************************************"); console.log("COL POSITION ON PDF DOCUMENT: " + chunk[0]); console.log("ROW POSITION ON PDF DOCUMENT: " + chunk[1]); console.log("CHAR WIDTH : " + chunk[2]); console.log("COL POSITION ON TEXT LAYOUT : " + chunk[3]); console.log("ROW POSITION ON TEXT LAYOUT : " + chunk[4]); console.log("TEXT : " + chunk[5]); console.log("WORD WIDTH : " + chunk[6]); console.log("WORD HEIGHT : " + chunk[7]); }) </script>
========JSON DATA FROM TEXT IN PAGE ==========
[
[57,722,12.239525,10,0," Simple PDF File 2 ",233,19],
[69,689,4.4004154,12,3," ...continued from page 1. Yet more text. And more text. And more text. ",317,7],
[69,677,4.5735703,12,4," And more text. And more text. And more text. And more text. And more ",320,7],
[69,665,4.2432384,12,5," text. Oh, how boring typing this stuff. But not as boring as watching ",301,7],
[69,653,4.4156933,12,6," paint dry. And more text. And more text. And more text. And more text. ",318,7],
[69,641,4.137618,12,8," Boring. More, a little more text. The end, and just as well. ",261,7]]
==============================================
******************************************
COL POSITION ON PDF DOCUMENT: 57
ROW POSITION ON PDF DOCUMENT: 722
CHAR WIDTH : 12.239525
COL POSITION ON TEXT LAYOUT : 10
ROW POSITION ON TEXT LAYOUT : 0
TEXT : Simple PDF File 2
WORD WIDTH : 233
WORD HEIGHT : 19
******************************************
COL POSITION ON PDF DOCUMENT: 69
ROW POSITION ON PDF DOCUMENT: 689
CHAR WIDTH : 4.4004154
COL POSITION ON TEXT LAYOUT : 12
ROW POSITION ON TEXT LAYOUT : 3
TEXT : ...continued from page 1. Yet more text. And more text. And more text.
WORD WIDTH : 317
WORD HEIGHT : 7
....
8 Image preview generation
You can obtain an image preview for a PDF
page using the getPreviewFromPage
method.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); for (var page = 1; page <= pdf.getNumberOfPages(); page++) { var data = pdf.getPreviewFromPage(page); var name = "/tmp/image" + page + ".jpg"; console.log("===== PAGE PREVIEW " + page + " saved to " + name); console.log(data); new Ax.io.File(name).write(data); } </script>
===== PAGE PREVIEW 1 saved to /tmp/image1.jpg
00000000 FF D8 FF E0 00 10 4A 46 49 46 00 01 02 00 00 01 ......JFIF......
00000010 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........
00000020 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 ................
00000030 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 ........... $.'
00000040 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 ",#..(7),01444.'
00000050 39 3D 38 32 3C 2E 33 34 32 FF DB 00 43 01 09 09 9=82<.342...C...
00000060 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 ........2!.!2222
00000070 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222
00000080 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222
00000090 32 32 32 32 32 32 32 32 32 32 32 32 32 32 FF C0 22222222222222..
259582 byte(s) more
===== PAGE PREVIEW 2 saved to /tmp/image2.jpg
00000000 FF D8 FF E0 00 10 4A 46 49 46 00 01 02 00 00 01 ......JFIF......
00000010 00 01 00 00 FF DB 00 43 00 08 06 06 07 06 05 08 .......C........
00000020 07 07 07 09 09 08 0A 0C 14 0D 0C 0B 0B 0C 19 12 ................
00000030 13 0F 14 1D 1A 1F 1E 1D 1A 1C 1C 20 24 2E 27 20 ........... $.'
00000040 22 2C 23 1C 1C 28 37 29 2C 30 31 34 34 34 1F 27 ",#..(7),01444.'
00000050 39 3D 38 32 3C 2E 33 34 32 FF DB 00 43 01 09 09 9=82<.342...C...
00000060 09 0C 0B 0C 18 0D 0D 18 32 21 1C 21 32 32 32 32 ........2!.!2222
00000070 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222
00000080 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 32 2222222222222222
00000090 32 32 32 32 32 32 32 32 32 32 32 32 32 32 FF C0 22222222222222..
214524 byte(s) more
9 Image extraction
Images contained in a PDF
can be extracted. The image is returned as a PDFImage
object.
You can determine if a document contains only the image layer (a scanend document with no text layer)
by using isImageOnly
method.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/scanned.pdf')); console.log(pdf.isImageOnly()) </script>
true
9.1 Getting images
To extract all images from a PDF
document:
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/scanned.pdf')); for (var image of pdf.getImages()) { console.log("Name : " + image.getName()); console.log("Type : " + image.getType()); console.log("FileName: " + image.getFileName()); console.log("Lengh : " + image.getLength()); // Simply send image to /tmp using it's file name image.writeTo("/tmp"); } </script>
Name : image0001
Type : png
FileName: image0001.png
Lengh : 30022
9.2 PdfImage
The PdfImage
object contains methods to get information about image and to write it to
disk or convert to blob.
Return | Method | Description |
---|---|---|
byte[] | getBytes | Returns the image bytes |
String | getType | Returns the image type (png, jpg, gif) |
String | getName | Returns the name for the image as given from image extractor |
String | getFileName | Returns the name plus type |
int | getLength | Returns the image length in bytes |
void | writeTo(String file) | Writes image to given file name or directory (if file points to directory) |
void | writeTo(File file) | Writes image to given file name or directory (if file points to directory) |
Blob | toBlob() | Convert image to a java.sql.Blob ready to be used in SQL operation |
10 Insert page
Inserts a new blank page into document at position passed as parameter. Position 1 represents before any page in document.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); pdf = pdf.insertPage(1); // Now, document has 1 more page pdf = pdf.insertPage(3); // New page after original 1st page return pdf.toBlob(); </script>
11 Watermarks
You can add text and image watermarks in a PDF
document.
<script> var pdf = new Ax.pdf.Reader(new Ax.net.URL('https://bitbucket.org/deister/axional-docs-resources/raw/master/PDF/sample.pdf')); var out = pdf.addWatermark(options => { options.setOpacity(0.40); // Apply only on specific pages ... (1,3,5,7) options.setPages(1); options.addText("This is center watermark").setFontSize(14).setFontColor(0, 255, 0); options.addText("This is top watermark", -1, options.getTop() - 20, 0).setFontFamily("COURIER").setFontSize(14).setFontColor(0, 0, 255); options.addText("This is bottom watermark", -1, options.getBottom() + 20, 0); options.addText("This is left watermark", options.getLeft() + 20, -1, 90); options.addText("This is right watermark", options.getRight() - 20, -1, 90).setFontFamily("HELVETICA").setFontSize(14).setFontColor(255, 0, 0); var image = options.setImage(Ax.util.Base64.decode('iVBORw0KGgoAAAANSUhEUgAAACAAAAAgCAYAAABzenr0AAAHVUlEQVR42rWXW1DU1x3Hv/+9sOwFlsuyXMpFyWiVq6CiMqNt1bFGpZJd1Eg1wbzlpTN5y7TP7eQtM33JWySJtZk0XCzWsY6SBjJYQUHkotUJIiDsLosILLsse/n3e/67i2jQrJn2zPxnz57/Oef3Ob/b+f0lPGuappaWVkjSYTkchoyf2sRKCSpJQigUunC8rq6OA8GXzZZW9ZMIMG+rrf3Jol9sLpcLV9vbW3578uQx/g39GEDG183NLvtbb+Hh6CMqQopPyovnl2WsX1eAc+fP41R9PRwOB77p6Pi2/sSJ/WtpYrUU69+am511BHg0Nq4AvC6EEC6egvw8/OXLL3H44JtISTFjamoKnV1dUyfq6go4LfCjAGPjE1CpVC+XJMBehKNg8YTpP/l5uThPALvNBs+CB+npaYomCDF53G5ftxpiTYDxicc/BIgKVTSj1UJKSHhe/vIywoEA5FAIuTnZCkD922/Dz3Gf16doQvjEd11d0zTzz2IQawJMPJ4igPSccEmjgUqnUwSHvIvoTktD4SefKK9H3n8fVU+eQKU3IrzsR3ZyEnpu3cL9Bw8UASIuhGZO0yeaW1tBgFQOPV0DoIkANjyecihhtCJcrYbKYMDgzh3QFhQgOD8Pw8gIJicmlCk5ubnwFhZCnZyM5dFRVA8MwPAS633d0oJjNlsmu64fAjQRgHZzOJwRB4ydXG+AWq/DzeJiWKhmn9eL6g8+eOYHtH3Xxx8jkZAzNM/WoSFqyYcw58nBEF+HlWnZ2VlgpOGY3f5qAKfTFQGgH6i5af+unQhxo7nhIdg+/BAh/zLmxsbgvNWrLMzcWglzfj7UugQ0f/QRzEXFUGtUKLt+IwIRCik+mpVljQ/A5ZqOOBtVrzabMWyrRWL/HZQzSWmsmej8w+9XbBvbRPR3//FPCLqc6Kedl8rLUNTcitDcXBRAhtWaER/AtNsNiacXTicWDhw9Cs2dO9h+pAZ9jWdBo6C4MQv6qkRloa97CUMNDmaZMCoazqDnYhuCZWUovXBBOUjY74dI7xkWS3wA7pmZyOlNJnxLm1rp1TtO1mN+dAwj/7yCosZMJJbqgKWoDhIlLA34MdzgROGvDyB5XT5u/PU8XPML+AV9JuTxKFqwpKfHBzDDkIoBOD//HIPvvYejDQ2Y+FcH5kYnUTacDcxSeCAKoOU2qRLuFE3BvC4Hub/cgwuNjSj59FNkvvPOCkA6QzcugNmns3TACIDjs88UgJrjx+G82QvPiAObhiyMYvlZPtPySZFwr9gNU2EWMrdVou2rrxSArHffVQAQDiE1JTU+gKdz85EIIEA7E5KFyafqwAEs0jlnev+D/LMmaDdrAH9UAzoJgbtBjJ1h2q38OYx0tu4rV+BmFtwblqMAYaSYk+MDmKftRIwrTsgZfYePQOrswPaqKrhvD7Jy0CDrzzpoKiLpOtgXhuN3fnaCsGwpQU93N+Tde1Dxj4uQyCicUMRhMn0pLoAFXiDKS7UGGkMibtvtkC5dQlFpKXQGI2YH7kIOSEwy0TCkMiStjNTSzfAzTQ8zE8qHDmFLUxOC3iXaPzIxKckUH4DHsxgZURKRHje2boXM/tPbt7Fv82Zo6aChAINu0ReZZtRDrdUgQEe7dvcuUrZsgUSV7+B9IDKiUL9oJpMxPoDFRe/KPaBiGGrE5lTvJfbzkpKwzH7F+vXPJaK+hw+RQNOMLyzgEENPy35QQLKvpEE2o9EQH4DX51vZWaUhADf79759cLe3w8pY9jBPGBITMb20pEzLYN/LvonvXHxn2bsXO69do0sQIBhYSZkGvT4+AF90Y8UEjIAn16/jm+pq2OgLc8yIi7zXB5liN5aXK9Pu9/ejhCnbaLXCzAzYTNv/qqsLabt2IcRIiJlAT9C4AJZF6hTyhfoJ0U7nM/HqLd+2De7eXiW9plLQ8P37ysKijRsxSzCRti2Vlei/eRMeXtF76YxBChdmEIISGFVxAQREZUMhaqr+SU8PrjL8anbvhn9yEgHGtJpg5k2b0Hr1qrKwdv9+zN27R8ek7Zk7dDk5aOvsxH6GY9r27bxJgwxHGVquiwtA2E7UAWoOttEEFvYrSkrgJYCS+jMy8GB8HEFWT6JpWGRsyMvD0vR0xNYE6BschJv71NAEoh6X2Re+FBdAiGoTNeH4uXPoOn0aB4uKoKVZxAnFLZlAe1+m3fdQ7aJ10BwH6Q/L4urlWqGhANV9eXgY1V98gbxTp5SSTM21cQHI0Rd/pxms/K3csEF85ayMT/Ck31PVB6MaucwTv0HT5FIzsTlq5ope1oRCym9EuR4dfzVAtCiNte6aGjguXoS49VfHvMgSFWfPooA3pGiPePP1nTmj1IGr54lYyjpyBFVtbSt7vrImbGptddpYfPw/WzOLFHtt7doALJm/ZyiZwtHMJeyNtb6Ooh8hcmxe7EPlJXPlaB5QRb4rPPz+fGMtgBQ+6/mkvjD+v2yCmMUGHmKN7wJRVhj5JLz+vq/VmBohbjulnPkvzMTLTpQMST8AAAAASUVORK5CYII=')); image.scaleToFit(512, 512); }); new Ax.io.File("/tmp/watermark.pdf").write(out); </script>
The sample PDF
with watermarks applied only on first page
