Java libraries for pdf

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

License

apache/pdfbox

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

The Apache PDFBox library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. PDFBox also includes several command line utilities. PDFBox is published under the Apache License, Version 2.0.

You can download binary versions for releases currently under development or older releases from our Download Page.

You need Java 8 (or higher) and Maven 3 to build PDFBox. The recommended build command is:

The default build will compile the Java sources and package the binary classes into jar packages. See the Maven documentation for all the other available build options.

There are various ways to help us improve PDFBox.

  • look at the Issue Tracker to help us fix bugs.
  • answer questions on our Users Mailing List.
  • help us enhance the Examples
  • help us to enhance the PDFBox Documentation or on GitHub.

Please follow the guidelines at our Support Page.

If you have questions about how to use PDFBox do ask on the Users Mailing List. This will get you help from the entire community.

The PDFBox examples and the test code in the sources will also provide additional information.

And there are additional resources available on sites such as Stack Overflow.

Читайте также:  Css style display auto

If you are sure you have found a bug the please report the issue in our Issue Tracker.

Known Limitations and Problems

See the Issue Tracker for the full list of known issues and requested features. Some of the more common issues are:

  1. You get text like «G38G43G36G51G5» instead of what you expect when you are extracting text. This is because the characters are a meaningless internal encoding that point to glyphs that are embedded in the PDF document. The only way to access the text is to use OCR. This may be a future enhancement.
  2. You get an error message like java.io.IOException: Can’t handle font width this MIGHT be due to the fact that you don’t have the org/apache/pdfbox/resources directory in your classpath. The easiest solution is to include the apache-pdfbox-x.x.x.jar in your classpath.
  3. You get text that has the correct characters, but in the wrong order. This mght be because you have not enabled sorting. The text in PDF files is stored in chunks and the chunks do not need to be stored in the order that they are displayed on a page. By default, PDFBox does not sort the text.

Collective work: Copyright 2015 The Apache Software Foundation.

Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the «License»); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 https://www.apache.org/licenses/LICENSE-2.0 

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an «AS IS» BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country’s laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See https://www.wassenaar.org/ for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

Читайте также:  Python web development environment

The following provides more details on the included cryptographic software:

Apache PDFBox uses the Java Cryptography Architecture (JCA) and the Bouncy Castle libraries for handling encryption in PDF documents.

Источник

Apache PDFBox ® — A Java PDF Library

The Apache PDFBox ® library is an open source Java tool for working with PDF documents. This project allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command-line utilities. Apache PDFBox is published under the Apache License v2.0.

Apache PDFBox 3.0.0-beta1 released
2023-07-14

The Apache PDFBox community is pleased to announce the first beta release of Apache PDFBox version 3.0.0. It is available for download at:

See the full release notes for details about this release.

The Migration Guide shall give users coming from PDFBox 2.0.x an overview about things to look at when switching over. More details to come.

Getting Help

To get help on using PDFBox, please Subscribe to the Users Mailing List and post your questions there. We’re happy to help.

The project is a volunteer effort and we’re always looking for interested people to help us improve PDFBox. There are a multitude of ways that you can help us depending on your skills. Subscribe to the Mailing Lists and find out how you can help.

Features

Extract Unicode text from PDF files.

Split a single PDF into many files or merge multiple PDF files.

Extract data from PDF forms or fill a PDF form.

Validate PDF files against the PDF/A-1b standard.

Print a PDF file using the standard Java printing API.

Save PDFs as image files, such as PNG or JPEG.

Create a PDF from scratch, with embedded fonts and images.

Источник

Download

The Apache PDFBox community provides feature and bugfix releases.

  • Beta release for PDFBox 3.0.0 — 3.0.0-beta1 (requires Java 8)
  • Feature release for PDFBox 2.0.x — 2.0.29 (requires Java 6)
  • Bugfix release for PDFBox 1.8.x — 1.8.17 (requires Java 5)
  • Feature release of JBIG2 ImageIO plugin 3.0.x — 3.0.4 (requires Java 6)
  • Previous releases

See also the export control information related to the encryption features included in Apache PDFBox.

Current releases

Binary Distribution

Version Description Download Link PGP Signature SHA512 Checksum
PDFBox
3.0.0-beta1

release candidate
Command line tools
PDFBox standalone pdfbox-app-3.0.0-beta1.jar ASC SHA512
Debugger standalone debugger-app-3.0.0-beta1.jar ASC SHA512
Preflight standalone preflight-app-3.0.0-beta1.jar ASC SHA512
Libraries of each subproject
pdfbox pdfbox-3.0.0-beta1.jar ASC SHA512
fontbox fontbox-3.0.0-beta1.jar ASC SHA512
preflight preflight-3.0.0-beta1.jar ASC SHA512
xmpbox xmpbox-3.0.0-beta1.jar ASC SHA512
pdfbox-tools pdfbox-tools-3.0.0-beta1.jar ASC SHA512
pdfbox-debugger pdfbox-debugger-3.0.0-beta1.jar ASC SHA512
PDFBox
2.0.29

feature
Command line tools
PDFBox standalone pdfbox-app-2.0.29.jar ASC SHA512
Debugger standalone debugger-app-2.0.29.jar ASC SHA512
Preflight standalone preflight-app-2.0.29.jar ASC SHA512
Libraries of each subproject
pdfbox pdfbox-2.0.29.jar ASC SHA512
fontbox fontbox-2.0.29.jar ASC SHA512
preflight preflight-2.0.29.jar ASC SHA512
xmpbox xmpbox-2.0.29.jar ASC SHA512
pdfbox-tools pdfbox-tools-2.0.29.jar ASC SHA512
pdfbox-debugger pdfbox-debugger-2.0.29.jar ASC SHA512
PDFBox
1.8.17

bugfix
Command line tools
PDFBox standalone pdfbox-app-1.8.17.jar ASC SHA512
Preflight standalone preflight-app-1.8.17.jar ASC SHA512
Libraries of each subproject
pdfbox pdfbox-1.8.17.jar ASC SHA512
fontbox fontbox-1.8.17.jar ASC SHA512
preflight preflight-1.8.17.jar ASC SHA512
jempbox jempbox-1.8.17.jar ASC SHA512
xmpbox xmpbox-1.8.17.jar ASC SHA512
JBIG2
3.0.4

feature
JBIG2 ImageIO plugin jbig2-imageio-3.0.4.jar ASC SHA512
Читайте также:  Цикл if else java

Source Distribution

Version Description Download Link PGP Signature SHA512 Checksum
PDFBox
3.0.0-beta1

release candidate
Source ZIP file incl. examples pdbox-3.0.0-beta1-src.zip ASC SHA512
PDFBox
2.0.29

feature
Source ZIP file incl. examples pdfbox-2.0.29-src.zip ASC SHA512
PDFBox
1.8.17

bugfix
Source ZIP file incl. examples pdfbox-1.8.17-src.zip ASC SHA512
JBIG2
3.0.4

feature
Source ZIP file jbig2-imageio-3.0.4-src.zip ASC SHA512

Verify

It is essential that you verify the integrity of the downloaded files using the PGP signatures or SHA512 checksums. Please read Verifying Apache HTTP Server Releases for more information on why you should verify our releases.

The PGP signatures can be verified using PGP or GPG. First download the KEYS file as well as the .asc signature files for the relevant release packages. Make sure you get these files from the main distribution directory, rather than from a mirror. Then verify the signatures using

% pgpk -a KEYS % pgpv pdfbox-X.Y.Z-src.zip.asc
% pgp -ka KEYS % pgp pdfbox-X.Y.Z-src.zip.asc
% gpg --import KEYS % gpg --verify pdfbox-X.Y.Z-src.zip.asc pdfbox-X.Y.Z-src.zip

Previous releases

Previous Apache releases (starting with version `0.8.0-incubating`) are available in the release archive. Older releases (up to version `0.7.3`) published from SourceForge are still available on SourceForge Files.

Export control information

This distribution includes cryptographic software. The country in which you currently reside may have restrictions on the import, possession, use, and/or re-export to another country, of encryption software. BEFORE using any encryption software, please check your country’s laws, regulations and policies concerning the import, possession, or use, and re-export of encryption software, to see if this is permitted. See for more information.

The U.S. Government Department of Commerce, Bureau of Industry and Security (BIS), has classified this software as Export Commodity Control Number (ECCN) 5D002.C.1, which includes information security software using or performing cryptographic functions with asymmetric algorithms. The form and manner of this Apache Software Foundation distribution makes it eligible for export under the License Exception ENC Technology Software Unrestricted (TSU) exception (see the BIS Export Administration Regulations, Section 740.13) for both object code and source code.

The following provides more details on the included cryptographic software:

  • Apache PDFBox uses the Java Cryptography Architecture (JCA) and the Bouncy Castle libraries for handling encryption in PDF documents.

Copyright © 2009–2023 The Apache Software Foundation. Licensed under the Apache License, Version 2.0.
Apache PDFBox, PDFBox, Apache, the Apache feather logo and the Apache PDFBox project logos are trademarks of The Apache Software Foundation.

Источник

Оцените статью