com.aspose.words

Class PdfLoadOptions

  • java.lang.Object
public class PdfLoadOptions 
extends LoadOptions

Allows to specify additional options when loading Pdf document into a Document object.

Constructor Summary
 
Property Getters/Setters Summary
java.lang.StringgetBaseUri()
void
setBaseUri(java.lang.Stringvalue)
           Gets or sets the string that will be used to resolve relative URIs found in the document into absolute URIs when required. Can be null or empty string. Default is null.
booleangetConvertShapeToOfficeMath()
void
           Gets or sets whether to convert shapes with EquationXML to Office Math objects.
java.nio.charset.CharsetgetEncoding()
void
setEncoding(java.nio.charset.Charsetvalue)
           Gets or sets the encoding that will be used to load an HTML or TXT document if the encoding is not specified in HTML/TXT. Can be null. Default is null.
FontSettingsgetFontSettings()
void
           Allows to specify document font settings.
LanguagePreferencesgetLanguagePreferences()
Gets language preferences that will be used when document is loading.
intgetLoadFormat()
void
setLoadFormat(intvalue)
           Specifies the format of the document to be loaded. Default is LoadFormat.AUTO. The value of the property is LoadFormat integer constant.
intgetMswVersion()
void
setMswVersion(intvalue)
           Allows to specify that the document loading process should match a specific MS Word version. Default value is MsWordVersion.WORD_2007The value of the property is MsWordVersion integer constant.
intgetPageCount()
void
setPageCount(intvalue)
           Gets or sets the number of pages to read. Default is MaxValue which means all pages of the document will be read.
intgetPageIndex()
void
setPageIndex(intvalue)
           Gets or sets the 0-based index of the first page to read. Default is 0.
java.lang.StringgetPassword()
void
setPassword(java.lang.Stringvalue)
           Gets or sets the password for opening an encrypted document. Can be null or empty string. Default is null.
booleangetPreserveIncludePictureField()
void
           Gets or sets whether to preserve the INCLUDEPICTURE field when reading Microsoft Word formats. The default value is false.
IResourceLoadingCallbackgetResourceLoadingCallback()
void
           Allows to control how external resources (images, style sheets) are loaded when a document is imported from HTML, MHTML.
booleangetSkipPdfImages()
void
setSkipPdfImages(booleanvalue)
           Gets or sets the flag indicating whether images must be skipped while loading PDF document. Default is False.
java.lang.StringgetTempFolder()
void
setTempFolder(java.lang.Stringvalue)
           Allows to use temporary files when reading document. By default this property is null and no temporary files are used.
booleangetUpdateDirtyFields()
void
setUpdateDirtyFields(booleanvalue)
           Specifies whether to update the fields with the dirty attribute.
IWarningCallbackgetWarningCallback()
void
           Called during a load operation, when an issue is detected that might result in data or formatting fidelity loss.
 

    • Constructor Detail

      • PdfLoadOptions

        public PdfLoadOptions()
    • Property Getters/Setters Detail

      • getBaseUri/setBaseUri

        public java.lang.String getBaseUri() / public void setBaseUri(java.lang.String value)
        
        Gets or sets the string that will be used to resolve relative URIs found in the document into absolute URIs when required. Can be null or empty string. Default is null.

        This property is used to resolve relative URIs into absolute in the following cases:

        1. When loading an HTML document from a stream and the document contains images with relative URIs and does not have a base URI specified in the BASE HTML element.
        2. When saving a document to PDF and other formats, to retrieve images linked using relative URIs so the images can be saved into the output document.

        Example:

        Shows how to open an HTML document with images from a stream using a base URI.
        // Open the stream
        InputStream stream = new FileInputStream(getMyDir() + "Document.html");
        
        try {
            // Pass the URI of the base folder so any images with relative URIs in the HTML document can be found
            // Note the Document constructor detects HTML format automatically
            LoadOptions loadOptions = new LoadOptions();
            loadOptions.setBaseUri(getImageDir());
        
            doc = new Document(stream, loadOptions);
        } finally {
            if (stream != null) stream.close();
        }
      • getConvertShapeToOfficeMath/setConvertShapeToOfficeMath

        public boolean getConvertShapeToOfficeMath() / public void setConvertShapeToOfficeMath(boolean value)
        
        Gets or sets whether to convert shapes with EquationXML to Office Math objects.

        Example:

        Shows how to convert shapes with EquationXML to Office Math objects.
        LoadOptions loadOptions = new LoadOptions();
        // Use 'true/false' values to convert shapes with EquationXML to Office Math objects or not
        loadOptions.setConvertShapeToOfficeMath(isConvertShapeToOfficeMath);
        
        // Specify load option to convert math shapes to office math objects on loading stage
        Document doc = new Document(getMyDir() + "Math shapes.docx", loadOptions);
      • getEncoding/setEncoding

        public java.nio.charset.Charset getEncoding() / public void setEncoding(java.nio.charset.Charset value)
        
        Gets or sets the encoding that will be used to load an HTML or TXT document if the encoding is not specified in HTML/TXT. Can be null. Default is null.

        This property is used only when loading HTML or TXT documents.

        If encoding is not specified in HTML/TXT and this property is null, then the system will try to automatically detect the encoding.

      • getFontSettings/setFontSettings

        public FontSettings getFontSettings() / public void setFontSettings(FontSettings value)
        
        Allows to specify document font settings.

        When loading some formats, Aspose.Words may require to resolve the fonts. For example, when loading HTML documents Aspose.Words may resolve the fonts to perform font fallback.

        If set to null, default static font settings FontSettings.DefaultInstance will be used.

        The default value is null.

        Example:

        Shows how to set font settings and apply them during the loading of a document.
        // Create a FontSettings object that will substitute the "Times New Roman" font with the font "Arvo" from our "MyFonts" folder
        FontSettings fontSettings = new FontSettings();
        fontSettings.setFontsFolder(getFontsDir(), false);
        fontSettings.getSubstitutionSettings().getTableSubstitution().addSubstitutes("Times New Roman", "Arvo");
        
        // Set that FontSettings object as a member of a newly created LoadOptions object
        LoadOptions loadOptions = new LoadOptions();
        loadOptions.setFontSettings(fontSettings);
        
        // We can now open a document while also passing the LoadOptions object into the constructor so the font substitution occurs upon loading
        Document doc = new Document(getMyDir() + "Document.docx", loadOptions);
        
        // The effects of our font settings can be observed after rendering
        doc.save(getArtifactsDir() + "Document.LoadOptionsFontSettings.pdf");

        Example:

        Shows how to designate font substitutes during loading.
        LoadOptions loadOptions = new LoadOptions();
        loadOptions.setFontSettings(new FontSettings());
        
        // Set a font substitution rule for a LoadOptions object that replaces a font that's not installed in our system with one that is
        TableSubstitutionRule substitutionRule = loadOptions.getFontSettings().getSubstitutionSettings().getTableSubstitution();
        substitutionRule.addSubstitutes("MissingFont", new String[]{"Comic Sans MS"});
        
        // If we pass that object while loading a document, any text with the "MissingFont" font will change to "Comic Sans MS"
        Document doc = new Document(getMyDir() + "Missing font.html", loadOptions);
        
        // At this point such text will still be in "MissingFont", and font substitution will be carried out once we save
        Assert.assertEquals("MissingFont", doc.getFirstSection().getBody().getFirstParagraph().getRuns().get(0).getFont().getName());
        
        doc.save(getArtifactsDir() + "Font.ResolveFontsBeforeLoadingDocument.pdf");
      • getLanguagePreferences

        public LanguagePreferences getLanguagePreferences()
        
        Gets language preferences that will be used when document is loading.

        Example:

        Shows how to set up language preferences that will be used when document is loading.
        LoadOptions loadOptions = new LoadOptions();
        loadOptions.getLanguagePreferences().addEditingLanguage(EditingLanguage.JAPANESE);
        
        Document doc = new Document(getMyDir() + "No default editing language.docx", loadOptions);
        
        int localeIdFarEast = doc.getStyles().getDefaultFont().getLocaleIdFarEast();
        if (localeIdFarEast == EditingLanguage.JAPANESE)
            System.out.println("The document either has no any FarEast language set in defaults or it was set to Japanese originally.");
        else
            System.out.println("The document default FarEast language was set to another than Japanese language originally, so it is not overridden.");
      • getLoadFormat/setLoadFormat

        public int getLoadFormat() / public void setLoadFormat(int value)
        
        Specifies the format of the document to be loaded. Default is LoadFormat.AUTO. The value of the property is LoadFormat integer constant.

        It is recommended that you specify the LoadFormat.AUTO value and let Aspose.Words detect the file format automatically. If you know the format of the document you are about to load, you can specify the format explicitly and this will slightly reduce the loading time by the overhead associated with auto detecting the format. If you specify an explicit load format and it will turn out to be wrong, the auto detection will be invoked and a second attempt to load the file will be made.

        Example:

        Shows how to load a document as HTML without automatic file format detection.
        LoadOptions loadOptions = new LoadOptions();
        loadOptions.setLoadFormat(com.aspose.words.LoadFormat.HTML);
        
        Document doc = new Document(getMyDir() + "Document.html", loadOptions);
      • getMswVersion/setMswVersion

        public int getMswVersion() / public void setMswVersion(int value)
        
        Allows to specify that the document loading process should match a specific MS Word version. Default value is MsWordVersion.WORD_2007The value of the property is MsWordVersion integer constant. Different Word versions may handle certain aspects of document content and formatting slightly differently during the loading process, which may result in minor differences in Document Object Model.

        Example:

        Shows how to emulate the loading procedure of a specific Microsoft Word version during document loading.
        // Create a new LoadOptions object, which will load documents according to MS Word 2019 specification by default
        LoadOptions loadOptions = new LoadOptions();
        Assert.assertEquals(MsWordVersion.WORD_2019, loadOptions.getMswVersion());
        
        Document doc = new Document(getMyDir() + "Document.docx", loadOptions);
        Assert.assertEquals(12.95, doc.getStyles().getDefaultParagraphFormat().getLineSpacing(), 0.005f);
        
        // We can change the loading version like this, to Microsoft Word 2007
        loadOptions.setMswVersion(MsWordVersion.WORD_2007);
        
        // This document is missing the default paragraph format style,
        // so when it is opened with either Microsoft Word or Aspose Words, that default style will be regenerated,
        // and will show up in the Styles collection, with values according to Microsoft Word 2007 specifications
        doc = new Document(getMyDir() + "Document.docx", loadOptions);
        Assert.assertEquals(13.8, doc.getStyles().getDefaultParagraphFormat().getLineSpacing(), 0.005f);
      • getPageCount/setPageCount

        public int getPageCount() / public void setPageCount(int value)
        
        Gets or sets the number of pages to read. Default is MaxValue which means all pages of the document will be read.
      • getPageIndex/setPageIndex

        public int getPageIndex() / public void setPageIndex(int value)
        
        Gets or sets the 0-based index of the first page to read. Default is 0.
      • getPassword/setPassword

        public java.lang.String getPassword() / public void setPassword(java.lang.String value)
        
        Gets or sets the password for opening an encrypted document. Can be null or empty string. Default is null.

        You need to know the password to open an encrypted document. If the document is not encrypted, set this to null or empty string.

        Example:

        Shows how to sign encrypted document file.
        // Create certificate holder from a file
        CertificateHolder certificateHolder = CertificateHolder.create(getMyDir() + "morzal.pfx", "aw");
        
        SignOptions signOptions = new SignOptions();
        signOptions.setComments("Comment");
        signOptions.setSignTime(new Date());
        signOptions.setDecryptionPassword("docPassword");
        
        // Digitally sign encrypted with "docPassword" document in the specified path
        String inputFileName = getMyDir() + "Encrypted.docx";
        String outputFileName = getArtifactsDir() + "DigitalSignatureUtil.DecryptionPassword.docx";
        
        DigitalSignatureUtil.sign(inputFileName, outputFileName, certificateHolder, signOptions);
      • getPreserveIncludePictureField/setPreserveIncludePictureField

        public boolean getPreserveIncludePictureField() / public void setPreserveIncludePictureField(boolean value)
        
        Gets or sets whether to preserve the INCLUDEPICTURE field when reading Microsoft Word formats. The default value is false.

        By default, the INCLUDEPICTURE field is converted into a shape object. You can override that if you need the field to be preserved, for example, if you wish to update it programmatically. Note however that this approach is not common for Aspose.Words. Use it on your own risk.

        One of the possible use cases may be using a MERGEFIELD as a child field to dynamically change the source path of the picture. In this case you need the INCLUDEPICTURE to be preserved in the model.

        Example:

        Shows a way to update a field ignoring the MERGEFORMAT switch.
        LoadOptions loadOptions = new LoadOptions();
        {
            loadOptions.setPreserveIncludePictureField(true);
        }
        Document doc = new Document(getMyDir() + "Field sample - INCLUDEPICTURE.docx", loadOptions);
        
        for (Field field : doc.getRange().getFields()) {
            if (((field.getType()) == (FieldType.FIELD_INCLUDE_PICTURE))) {
                FieldIncludePicture includePicture = (FieldIncludePicture) field;
                includePicture.setSourceFullName(getImageDir() + "Transparent background logo.png");
                includePicture.update(true);
        
                doc.updateFields();
                doc.save(getArtifactsDir() + "Field.UpdateFieldIgnoringMergeFormat.docx");
      • getResourceLoadingCallback/setResourceLoadingCallback

        public IResourceLoadingCallback getResourceLoadingCallback() / public void setResourceLoadingCallback(IResourceLoadingCallback value)
        
        Allows to control how external resources (images, style sheets) are loaded when a document is imported from HTML, MHTML.

        Example:

        Shows how to handle external resources in Html documents during loading.
        public void loadOptionsCallback() throws Exception {
            // Create a new LoadOptions object and set its ResourceLoadingCallback attribute
            // as an instance of our IResourceLoadingCallback implementation 
            LoadOptions loadOptions = new LoadOptions();
            loadOptions.setResourceLoadingCallback(new HtmlLinkedResourceLoadingCallback());
        
            // When we open an Html document, external resources such as references to CSS stylesheet files and external images
            // will be handled in a custom manner by the loading callback as the document is loaded
            Document doc = new Document(getMyDir() + "Images.html", loadOptions);
            doc.save(getArtifactsDir() + "Document.LoadOptionsCallback.pdf");
        }
        
        /// <summary>
        /// Resource loading callback that, upon encountering external resources,
        /// acknowledges CSS style sheets and replaces all images with a substitute.
        /// </summary>
        private static class HtmlLinkedResourceLoadingCallback implements IResourceLoadingCallback {
            public int resourceLoading(ResourceLoadingArgs args) throws IOException {
                switch (args.getResourceType()) {
                    case ResourceType.CSS_STYLE_SHEET:
                        System.out.println("External CSS Stylesheet found upon loading: {args.OriginalUri}");
                        return ResourceLoadingAction.DEFAULT;
                    case ResourceType.IMAGE:
                        System.out.println("External Image found upon loading: {args.OriginalUri}");
        
                        final String NEW_IMAGE_FILENAME = "Logo.jpg";
                        System.out.println("\tImage will be substituted with: {newImageFilename}");
        
                        byte[] imageBytes = DocumentHelper.getBytesFromStream(new FileInputStream(getImageDir() + NEW_IMAGE_FILENAME));
                        args.setData(imageBytes);
        
                        return ResourceLoadingAction.USER_PROVIDED;
        
                }
                return ResourceLoadingAction.DEFAULT;
            }
        }
      • getSkipPdfImages/setSkipPdfImages

        public boolean getSkipPdfImages() / public void setSkipPdfImages(boolean value)
        
        Gets or sets the flag indicating whether images must be skipped while loading PDF document. Default is False.
      • getTempFolder/setTempFolder

        public java.lang.String getTempFolder() / public void setTempFolder(java.lang.String value)
        
        Allows to use temporary files when reading document. By default this property is null and no temporary files are used.

        The folder must exist and be writable, otherwise an exception will be thrown.

        Aspose.Words automatically deletes all temporary files when reading is complete.

        Example:

        Shows how to load a document using temporary files.
        // Note that such an approach can reduce memory usage but degrades speed
        LoadOptions loadOptions = new LoadOptions();
        loadOptions.setTempFolder("C:\\TempFolder\\");
        
        // Ensure that the directory exists and load
        new File(loadOptions.getTempFolder()).mkdir();
        
        Document doc = new Document(getMyDir() + "Document.docx", loadOptions);
      • getUpdateDirtyFields/setUpdateDirtyFields

        public boolean getUpdateDirtyFields() / public void setUpdateDirtyFields(boolean value)
        
        Specifies whether to update the fields with the dirty attribute.

        Example:

        Shows how to use special property for updating field result.
        Document doc = new Document();
        DocumentBuilder builder = new DocumentBuilder(doc);
        
        // Give the document's built in property "Author" a value and display it with a field
        doc.getBuiltInDocumentProperties().setAuthor("John Doe");
        FieldAuthor field = (FieldAuthor) builder.insertField(FieldType.FIELD_AUTHOR, true);
        
        Assert.assertFalse(field.isDirty());
        Assert.assertEquals("John Doe", field.getResult());
        
        // Update the "Author" property
        doc.getBuiltInDocumentProperties().setAuthor("John & Jane Doe");
        
        // AUTHOR is one of the field types whose fields do not update according to their source values in real time,
        // and need to be updated manually beforehand every time an accurate value is required
        Assert.assertEquals("John Doe", field.getResult());
        
        // Since the field's value is out of date, we can mark it as "Dirty"
        field.isDirty(true);
        
        OutputStream docStream = new FileOutputStream(getArtifactsDir() + "Filed.UpdateDirtyFields.docx");
        try {
            doc.save(docStream, SaveFormat.DOCX);
        
            // Re-open the document from the stream while using a LoadOptions object to specify
            // whether to update all fields marked as "Dirty" in the process, so they can display accurate values immediately
            LoadOptions options = new LoadOptions();
            options.setUpdateDirtyFields(doUpdateDirtyFields);
        
            doc = new Document(String.valueOf(docStream), options);
        
            Assert.assertEquals("John & Jane Doe", doc.getBuiltInDocumentProperties().getAuthor());
        
            field = (FieldAuthor) doc.getRange().getFields().get(0);
        
            if (doUpdateDirtyFields) {
                Assert.assertEquals("John & Jane Doe", field.getResult());
                Assert.assertFalse(field.isDirty());
            } else {
                Assert.assertEquals("John Doe", field.getResult());
                Assert.assertTrue(field.isDirty());
            }
        } finally {
            if (docStream != null) docStream.close();
        }
      • getWarningCallback/setWarningCallback

        public IWarningCallback getWarningCallback() / public void setWarningCallback(IWarningCallback value)
        
        Called during a load operation, when an issue is detected that might result in data or formatting fidelity loss.

        Example:

        Shows how to print and store warnings that occur during document loading.
        public void loadOptionsWarningCallback() throws Exception {
            // Create a new LoadOptions object and set its WarningCallback attribute as an instance of our IWarningCallback implementation
            LoadOptions loadOptions = new LoadOptions();
            loadOptions.setWarningCallback(new DocumentLoadingWarningCallback());
        
            // Warnings that occur during loading of the document will now be printed and stored
            Document doc = new Document(getMyDir() + "Document.docx", loadOptions);
        
            ArrayList<WarningInfo> warnings = ((DocumentLoadingWarningCallback) loadOptions.getWarningCallback()).getWarnings();
            Assert.assertEquals(3, warnings.size());
        }
        
        /// <summary>
        /// IWarningCallback that prints warnings and their details as they arise during document loading.
        /// </summary>
        private static class DocumentLoadingWarningCallback implements IWarningCallback {
            public void warning(WarningInfo info) {
                System.out.println("Warning: {info.WarningType}");
                System.out.println("\tSource: {info.Source}");
                System.out.println("\tDescription: {info.Description}");
                mWarnings.add(info);
            }
        
            public ArrayList<WarningInfo> getWarnings() {
                return mWarnings;
            }
        
            private ArrayList<WarningInfo> mWarnings = new ArrayList<>();
        }