com.aspose.ocr

Class DocumentRecognitionSettings



  • public class DocumentRecognitionSettings
    extends Object
    Settings for the pdf recognition. Contains elements that allow customizing the recognition process.
    Author:
    Aspose
    • Constructor Detail

      • DocumentRecognitionSettings

        public DocumentRecognitionSettings(int pagesNumber)
        Initializes a new instance of the @see #DocumentRecognitionSettings class with default properties. Demands to set pagesNumber. Set 0 to recognize all pages in document.
        Parameters:
        pagesNumber - Set the number of pages for recognition multipage pdf file.
      • DocumentRecognitionSettings

        public DocumentRecognitionSettings(int startPage,
                                           int pagesNumber)
        Initializes a new instance of the @see #DocumentRecognitionSettings class with short set of properties.
        Parameters:
        startPage - Set the first page for recognition.
        pagesNumber - Set the number of pages for recognition multipage pdf file.
      • DocumentRecognitionSettings

        public DocumentRecognitionSettings(int startPage,
                                           int pagesNumber,
                                           Language language,
                                           boolean detectAreas,
                                           boolean autoSkew,
                                           int threshold)
        Initializes a new instance of the @see #DocumentRecognitionSettings class with full set of properties.
        Parameters:
        startPage - Set the first page for recognition. 0 by default.
        pagesNumber - Set the number of pages for recognition multipage pdf file.
        language - Language used for OCR.
        detectAreas - Enable automatic text areas detection.
        autoSkew - Enable automatic image skew correction.
        threshold - Custom image binarization threshold
    • Method Detail

      • setDetectAreas

        public void setDetectAreas(boolean detectAreas)
      • setAutoSkew

        public void setAutoSkew(boolean autoSkew)
      • setLanguage

        public void setLanguage(Language language)
      • setThresholdValue

        public void setThresholdValue(int thresholdValue)
      • setIgnoredCharacters

        public void setIgnoredCharacters(String ignoredCharacters)
      • setLinesFiltration

        public void setLinesFiltration(boolean linesFiltration)
      • setStartPage

        public void setStartPage(int startPage)
      • setPagesNumber

        public void setPagesNumber(int pagesNumber)
      • setThreadsCount

        public void setThreadsCount(int threadsCount)
        Gets or sets the number of threads for processing. By default, 0 means that the image will be processed with the number of threads equal to your number of processors. ThreadsCount = 1 means that the image will be processed in the main thread.
        Parameters:
        threadsCount - the number of threads that will be created for parallel recognition of image fragments.
      • setAutoContrast

        public void setAutoContrast(boolean autoContrast)
        Allows using an additional contrast correction algorithm for the image before recognition.
        Parameters:
        autoContrast - contains boolean value - a contrast correction filter is set.
      • setAllowedCharacters

        public void setAllowedCharacters(CharactersAllowedType allowedCharacters)
        Allowed characters set. Determines the type of characters allowed for recognition result.
        Parameters:
        allowedCharacters - contains enum @see CharactersAllowedType value.
      • getStartPage

        public int getStartPage()
        First page in pdf file to extract images.
        Returns:
      • getPagesNumber

        public int getPagesNumber()
        Total amount of pages from pdf file to extract i,ages (start with startPage).
        Returns: