TextDevice

Inheritance: java.lang.Object, com.aspose.pdf.devices.Device, com.aspose.pdf.devices.PageDevice

public final class TextDevice extends PageDevice

Represents class for converting pdf document pages into text.


The example demonstrates how to extract text on the first PDF document page.

 Document doc = new Document(inFile);
 String extractedText;
 ByteArrayOutputStream ms = new ByteArrayOutputStream();
 try
 {
 // create text device
 TextDevice device = new TextDevice();
 // convert the page and save text to the stream
 device.process(doc.getPages().get_Item(1), ms);
 // use the extracted text
 extractedText = Encoding.getUnicode().getString(ms.toByteArray());

 ms.close();
 } catch (IOException e) {
 e.printStackTrace();
 }

The TextDevice object is basically used to extract text from pdf page.

Constructors

ConstructorDescription
TextDevice(TextExtractionOptions extractionOptions)Initializes a new instance of the TextDevice with text extraction options.
TextDevice()Initializes a new instance of the TextDevice with the Raw text formatting mode and Unicode text encoding.
TextDevice(TextEncodingInternal encoding)Initializes a new instance of the TextDevice for the specified encoding.
TextDevice(Charset encoding)Initializes a new instance of the TextDevice for the specified encoding.
TextDevice(TextExtractionOptions extractionOptions, TextEncodingInternal encoding)Initializes a new instance of the TextDevice for the specified encoding with text extraction options.
TextDevice(TextExtractionOptions extractionOptions, Charset encoding)Initializes a new instance of the TextDevice for the specified encoding with text extraction options.

Methods

MethodDescription
getExtractionOptions()Gets text extraction options.
setExtractionOptions(TextExtractionOptions value)Sets text extraction options.
getEncodingInternal()Gets encoding of extracted text.
getEncoding()Gets encoding of extracted text.
setEncodingInternal(TextEncodingInternal value)Sets encoding of extracted text.
setEncoding(Charset value)Sets encoding of extracted text.
processInternal(Page page, System.IO.Stream output)Convert page and save it as text stream.
process(Page page, OutputStream output)Convert page and save it as text stream.

TextDevice(TextExtractionOptions extractionOptions)

public TextDevice(TextExtractionOptions extractionOptions)

Initializes a new instance of the TextDevice with text extraction options.

Parameters:

ParameterTypeDescription
extractionOptionsTextExtractionOptionsText extraction options.

TextDevice()

public TextDevice()

Initializes a new instance of the TextDevice with the Raw text formatting mode and Unicode text encoding.

TextDevice(TextEncodingInternal encoding)

public TextDevice(TextEncodingInternal encoding)

Initializes a new instance of the TextDevice for the specified encoding.

Parameters:

ParameterTypeDescription
encodingTextEncodingInternalEncoding of extracted text

TextDevice(Charset encoding)

public TextDevice(Charset encoding)

Initializes a new instance of the TextDevice for the specified encoding.

Parameters:

ParameterTypeDescription
encodingjava.nio.charset.CharsetEncoding of extracted text

TextDevice(TextExtractionOptions extractionOptions, TextEncodingInternal encoding)

public TextDevice(TextExtractionOptions extractionOptions, TextEncodingInternal encoding)

Initializes a new instance of the TextDevice for the specified encoding with text extraction options.

Parameters:

ParameterTypeDescription
extractionOptionsTextExtractionOptionsText extraction options.
encodingTextEncodingInternalEncoding of extracted text.

TextDevice(TextExtractionOptions extractionOptions, Charset encoding)

public TextDevice(TextExtractionOptions extractionOptions, Charset encoding)

Initializes a new instance of the TextDevice for the specified encoding with text extraction options.

Parameters:

ParameterTypeDescription
extractionOptionsTextExtractionOptionsText extraction options.
encodingjava.nio.charset.CharsetEncoding of extracted text.

getExtractionOptions()

public TextExtractionOptions getExtractionOptions()

Gets text extraction options.

Returns: TextExtractionOptions - TextExtractionOptions element


The example demonstrates how to extracted text in raw order.

 Document doc = new Document(inFile);
 String extractedText;
 // create text device
 TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
 // convert the page and save text to the stream
 device.process(doc.getPages().get_Item(1), outFile);

setExtractionOptions(TextExtractionOptions value)

public void setExtractionOptions(TextExtractionOptions value)

Sets text extraction options.

Parameters:

ParameterTypeDescription
valueTextExtractionOptionsTextExtractionOptions element

The example demonstrates how to extracted text in raw order.

              Document doc = new Document(inFile);
              String extractedText;
              // create text device
              TextDevice device = new TextDevice(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Raw));
              // convert the page and save text to the stream
              device.process(doc.getPages().get_Item(1), outFile);
``` |

### getEncodingInternal() {#getEncodingInternal--}

public TextEncodingInternal getEncodingInternal()



Gets encoding of extracted text.

**Returns:**
[TextEncodingInternal](../../com.aspose.pdf/textencodinginternal) - TextEncodingInternal element

--------------------

The example demonstrates how to represent extracted text in UTF-8 encoding.

Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(java.nio.charset.Charset.forName(“UTF-8”)); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);

### getEncoding() {#getEncoding--}

public Charset getEncoding()



Gets encoding of extracted text.

**Returns:**
java.nio.charset.Charset - Charset element

--------------------

The example demonstrates how to represent extracted text in UTF-8 encoding.

Document doc = new Document(inFile); String extractedText; // create text device TextDevice device = new TextDevice(java.nio.charset.Charset.forName(“UTF-8”)); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), outFile);

### setEncodingInternal(TextEncodingInternal value) {#setEncodingInternal-com.aspose.pdf.TextEncodingInternal-}

public void setEncodingInternal(TextEncodingInternal value)



Sets encoding of extracted text.

**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| value | [TextEncodingInternal](../../com.aspose.pdf/textencodinginternal) | TextEncodingInternal element

--------------------

The example demonstrates how to represent extracted text in UTF-8 encoding.

          Document doc = new Document(inFile);
          String extractedText;
          // create text device
          TextDevice device = new TextDevice(TextEncodingInternal.getUTF8());
          // convert the page and save text to the stream
          device.process(doc.getPages().get_Item(1), outFile);

### setEncoding(Charset value) {#setEncoding-java.nio.charset.Charset-}

public void setEncoding(Charset value)



Sets encoding of extracted text.

**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| value | java.nio.charset.Charset | Charset element

--------------------

The example demonstrates how to represent extracted text in UTF-8 encoding.

          Document doc = new Document(inFile);
          String extractedText;
          // create text device
          TextDevice device = new TextDevice(java.nio.charset.Charset.forName("UTF-8"));
          // convert the page and save text to the stream
          device.process(doc.getPages().get_Item(1), outFile);

### processInternal(Page page, System.IO.Stream output) {#processInternal-com.aspose.pdf.Page-com.aspose.ms.System.IO.Stream-}

public void processInternal(Page page, System.IO.Stream output)



Convert page and save it as text stream.

--------------------

The example demonstrates how to extract text on the first PDF document page.

Document doc = new Document(inFile); String extractedText; ByteArrayOutputStream ms = new ByteArrayOutputStream();

// create text device TextDevice device = new TextDevice(); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), ms); // use the extracted text extractedText = Encoding.getUnicode().getString(ms.toByteArray()); ms.close();


**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| page | [Page](../../com.aspose.pdf/page) | The page to convert. |
| output | com.aspose.ms.System.IO.Stream | Result stream. |

### process(Page page, OutputStream output) {#process-com.aspose.pdf.Page-java.io.OutputStream-}

public void process(Page page, OutputStream output)



Convert page and save it as text stream.

--------------------

The example demonstrates how to extract text on the first PDF document page.

Document doc = new Document(inFile); String extractedText; ByteArrayOutputStream ms = new ByteArrayOutputStream();

// create text device TextDevice device = new TextDevice(); // convert the page and save text to the stream device.process(doc.getPages().get_Item(1), ms); // use the extracted text extractedText = Encoding.getUnicode().getString(ms.toByteArray()); ms.close();


**Parameters:**
| Parameter | Type | Description |
| --- | --- | --- |
| page | [Page](../../com.aspose.pdf/page) | The page to convert. |
| output | java.io.OutputStream | Result stream. |