com.aspose.pdf.devices

Class TextDevice



  • public final class TextDevice
    extends PageDevice

    Represents class for converting pdf document pages into text.


     The example demonstrates how to extract text on the first PDF document page.
     
                   Document doc = new Document(inFile);
                   String extractedText;
                   ByteArrayOutputStream ms = new ByteArrayOutputStream();
                   try 
                   {
                       // create text device
                       TextDevice device = new TextDevice();
                       // convert the page and save text to the stream
                       device.process(doc.getPages().get_Item(1), ms);
                       // use the extracted text               
                       extractedText = Encoding.getUnicode().getString(ms.toByteArray());
    
                        ms.close();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
     

    The TextDevice object is basically used to extract text from pdf page.

    • Constructor Detail

      • TextDevice

        public TextDevice(TextExtractionOptions extractionOptions)

        Initializes a new instance of the TextDevice with text extraction options.

        Parameters:
        extractionOptions - Text extraction options.
      • TextDevice

        public TextDevice()

        Initializes a new instance of the TextDevice with the Raw text formatting mode and Unicode text encoding.

      • TextDevice

        public TextDevice(TextEncodingInternal encoding)

        Initializes a new instance of the TextDevice for the specified encoding.

        Parameters:
        encoding - Encoding of extracted text
      • TextDevice

        public TextDevice(Charset encoding)

        Initializes a new instance of the TextDevice for the specified encoding.

        Parameters:
        encoding - Encoding of extracted text
      • TextDevice

        public TextDevice(TextExtractionOptions extractionOptions,
                  TextEncodingInternal encoding)

        Initializes a new instance of the TextDevice for the specified encoding with text extraction options.

        Parameters:
        extractionOptions - Text extraction options.
        encoding - Encoding of extracted text.
      • TextDevice

        public TextDevice(TextExtractionOptions extractionOptions,
                  Charset encoding)

        Initializes a new instance of the TextDevice for the specified encoding with text extraction options.

        Parameters:
        extractionOptions - Text extraction options.
        encoding - Encoding of extracted text.
    • Method Detail

      • getExtractionOptions

        public TextExtractionOptions getExtractionOptions()

        Gets text extraction options.

        Returns:
        TextExtractionOptions element
         The example demonstrates how to extracted text in raw order.
         
                       Document doc = new Document(inFile);
                       String extractedText;
                       // create text device
                       TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
                       // convert the page and save text to the stream
                       device.process(doc.getPages().get_Item(1), outFile);
         
      • setExtractionOptions

        public void setExtractionOptions(TextExtractionOptions value)

        Sets text extraction options.

        Parameters:
        value - TextExtractionOptions element
         The example demonstrates how to extracted text in raw order.
         
                       Document doc = new Document(inFile);
                       String extractedText;
                       // create text device
                       TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
                       // convert the page and save text to the stream
                       device.process(doc.getPages().get_Item(1), outFile);
         
      • getEncodingInternal

        public TextEncodingInternal getEncodingInternal()

        Gets encoding of extracted text.

        Returns:
        TextEncodingInternal element
         The example demonstrates how to represent extracted text in UTF-8 encoding.
         
                       Document doc = new Document(inFile);
                       String extractedText;
                       // create text device
                       TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8"));
                       // convert the page and save text to the stream
                       device.process(doc.getPages().get_Item(1), outFile);
         
      • getEncoding

        public Charset getEncoding()

        Gets encoding of extracted text.

        Returns:
        Charset element
         The example demonstrates how to represent extracted text in UTF-8 encoding.
         
                       Document doc = new Document(inFile);
                       String extractedText;
                       // create text device
                       TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8"));
                       // convert the page and save text to the stream
                       device.process(doc.getPages().get_Item(1), outFile);
         
      • setEncodingInternal

        public void setEncodingInternal(TextEncodingInternal value)

        Sets encoding of extracted text.

        Parameters:
        value - TextEncodingInternal element
         The example demonstrates how to represent extracted text in UTF-8 encoding.
         
                               Document doc = new Document(inFile);
                               String extractedText;
                               // create text device
                               TextDevice device = new TextDevice(TextEncodingInternal.getUTF8());
                               // convert the page and save text to the stream
                               device.process(doc.getPages().get_Item(1), outFile);
         
      • setEncoding

        public void setEncoding(Charset value)

        Sets encoding of extracted text.

        Parameters:
        value - Charset element
         The example demonstrates how to represent extracted text in UTF-8 encoding.
         
                               Document doc = new Document(inFile);
                               String extractedText;
                               // create text device
                               TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8"));
                               // convert the page and save text to the stream
                               device.process(doc.getPages().get_Item(1), outFile);
         
      • processInternal

        public void processInternal(Page page,
                           com.aspose.ms.System.IO.Stream output)

        Convert page and save it as text stream.


         The example demonstrates how to extract text on the first PDF document page.
         
                       Document doc = new Document(inFile);
                       String extractedText;
                       ByteArrayOutputStream ms = new ByteArrayOutputStream();
        
                           // create text device
                           TextDevice device = new TextDevice();
                           // convert the page and save text to the stream
                           device.process(doc.getPages().get_Item(1), ms);
                           // use the extracted text
                           extractedText = Encoding.getUnicode().getString(ms.toByteArray());
                           ms.close();
         
        Specified by:
        processInternal in class PageDevice
        Parameters:
        page - The page to convert.
        output - Result stream.
      • process

        public void process(Page page,
                   OutputStream output)

        Convert page and save it as text stream.


         The example demonstrates how to extract text on the first PDF document page.
         
                       Document doc = new Document(inFile);
                       String extractedText;
                       ByteArrayOutputStream ms = new ByteArrayOutputStream();
        
                           // create text device
                           TextDevice device = new TextDevice();
                           // convert the page and save text to the stream
                           device.process(doc.getPages().get_Item(1), ms);
                           // use the extracted text
                           extractedText = Encoding.getUnicode().getString(ms.toByteArray());
                           ms.close();
         
        Overrides:
        process in class PageDevice
        Parameters:
        page - The page to convert.
        output - Result stream.