Aspose::Pdf::Text::TextAbsorber Class Reference

Represents an absorber object of a text. Performs text extraction and provides access to the result via TextAbsorber::Text object. More...

Inherits System::Object.

Inherited by Aspose::Pdf::Text::TextFragmentAbsorber, and Aspose::Pdf::Text::TextParagraphAbsorber.

Public Member Functions

virtual System::String get_Text ()
 Gets extracted text that the TextAbsorber extracts on the PDF document or page. More...
 
bool get_HasErrors ()
 Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance. More...
 
System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< TextExtractionError > > > get_Errors ()
 List of TextExtractionError objects. It contain information about errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance. More...
 
virtual System::SharedPtr< TextExtractionOptionsget_ExtractionOptions ()
 Gets text extraction options. More...
 
virtual void set_ExtractionOptions (System::SharedPtr< TextExtractionOptions > value)
 Sets text extraction options. More...
 
virtual System::SharedPtr< Aspose::Pdf::Text::TextSearchOptionsget_TextSearchOptions ()
 Gets text search options. More...
 
virtual void set_TextSearchOptions (System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions > value)
 Sets text search options. More...
 
virtual void Visit (System::SharedPtr< Page > page)
 Extracts text on the specified page More...
 
virtual void Visit (System::SharedPtr< XForm > form)
 Extracts text on the specified XForm. More...
 
virtual void Visit (System::SharedPtr< Document > pdf)
 Extracts text on the specified document More...
 
 TextAbsorber ()
 Initializes a new instance of the TextAbsorber. More...
 
 TextAbsorber (System::SharedPtr< TextExtractionOptions > extractionOptions)
 Initializes a new instance of the TextAbsorber with extraction options. More...
 
 TextAbsorber (System::SharedPtr< TextExtractionOptions > extractionOptions, System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions > textSearchOptions)
 Initializes a new instance of the TextAbsorber with extraction and text search options. More...
 
 TextAbsorber (System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions > textSearchOptions)
 Initializes a new instance of the TextAbsorber with text search options. More...
 
- Public Member Functions inherited from System::Object
ASPOSECPP_SHARED_API Object ()
 Creates object. Initializes all internal data structures. More...
 
virtual ASPOSECPP_SHARED_API ~Object ()
 Destroys object. Frees all internal data structures. More...
 
ASPOSECPP_SHARED_API Object (Object const &x)
 Copy constructor. Doesn't copy anything, really, just initializes new object and enables copy constructing subclasses. More...
 
Objectoperator= (Object const &x)
 Assignment operator. Doesn't copy anything, really, just initializes new object and enables copy constructing subclasses. More...
 
ObjectSharedRefAdded ()
 Increments shared reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
int SharedRefRemovedSafe ()
 Decrements and returns shared reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
int RemovedSharedRefs (int count)
 Decreases shared reference count by specified value. More...
 
Detail::SmartPtrCounter * WeakRefAdded ()
 Increments weak reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
void WeakRefRemoved ()
 Decrements weak reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
Detail::SmartPtrCounter * GetCounter ()
 Gets reference counter data structure associated with the object. More...
 
int SharedCount () const
 Gets current value of shared refernce counter. More...
 
ASPOSECPP_SHARED_API void Lock ()
 Implements C# lock() statement locking. Call directly or use LockContext sentry object. More...
 
ASPOSECPP_SHARED_API void Unlock ()
 Implements C# lock() statement unlocking. Call directly or use LockContext sentry object. More...
 
virtual ASPOSECPP_SHARED_API bool Equals (ptr obj)
 Compares objects using C# Object.Equals semantics. More...
 
virtual ASPOSECPP_SHARED_API int GetHashCode () const
 Analog of C# Object.GetHashCode() method. Enables hashing of custom objects. More...
 
virtual ASPOSECPP_SHARED_API String ToString () ASPOSE_CONST
 Analog of C# Object.ToString() method. Enables converting custom objects to string. More...
 
virtual ASPOSECPP_SHARED_API ptr MemberwiseClone () const
 Analog of C# Object.MemberwiseClone() method. Enables cloning custom types. More...
 
virtual ASPOSECPP_SHARED_API const TypeInfoGetType () const
 Gets actual type of object. Analog of C# System.Object.GetType() call. More...
 
virtual ASPOSECPP_SHARED_API bool Is (const TypeInfo &targetType) const
 Check if object represents an instance of type described by targetType. Analog of C# 'is' operator. More...
 
virtual ASPOSECPP_SHARED_API void SetTemplateWeakPtr (unsigned int argument)
 Set n'th template argument a weak pointer (rather than shared). Allows switching pointers in containers to weak mode. More...
 
template<>
bool Equals (float const &objA, float const &objB)
 
template<>
bool Equals (double const &objA, double const &objB)
 
template<>
bool ReferenceEquals (String const &str, std::nullptr_t)
 
template<>
bool ReferenceEquals (String const &str1, String const &str2)
 

Protected Member Functions

System::String GetTotalText (System::SharedPtr< Aspose::Pdf::Engine::CommonData::Text::Segmenting::TextSegmenter > segmenter, bool isFormatted)
 

Protected Attributes

System::SharedPtr< System::Text::StringBuilderextractedText
 

Additional Inherited Members

- Public Types inherited from System::Object
typedef SmartPtr< Objectptr
 Alias for smart pointer type. More...
 
typedef System::Details::SharedMembersType shared_members_type
 structure to keep list of shared pointers contained in object. More...
 
- Static Public Member Functions inherited from System::Object
static bool ReferenceEquals (ptr const &objA, ptr const &objB)
 Compares objects by reference. More...
 
template<typename T >
static std::enable_if<!IsSmartPtr< T >::value, bool >::type ReferenceEquals (T const &objA, T const &objB)
 Compares objects by reference. More...
 
template<typename T >
static std::enable_if<!IsSmartPtr< T >::value, bool >::type ReferenceEquals (T const &objA, std::nullptr_t)
 Reference-compares value type object with nullptr. More...
 
template<typename T1 , typename T2 >
static std::enable_if< IsSmartPtr< T1 >::value &&IsSmartPtr< T2 >::value, bool >::type Equals (T1 const &objA, T2 const &objB)
 Compares reference type objects in C# style. More...
 
template<typename T1 , typename T2 >
static std::enable_if<!IsSmartPtr< T1 >::value &&!IsSmartPtr< T2 >::value, bool >::type Equals (T1 const &objA, T2 const &objB)
 Compares value type objects in C# style. More...
 
static const TypeInfoType ()
 Impleemnts C# typeof(System.Object) construct. More...
 

Detailed Description

Represents an absorber object of a text. Performs text extraction and provides access to the result via TextAbsorber::Text object.

The TextAbsorber object is used to extract text from a Pdf document or the document's page.

The example demonstrates how to extract text on the first PDF document page.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for first page
doc.Pages[1].Accept(absorber);
// get the extracted text
string extractedText = absorber.Text;

Constructor & Destructor Documentation

◆ TextAbsorber() [1/4]

Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( )

Initializes a new instance of the TextAbsorber.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

The example demonstrates how to extract text from all pages of the PDF document.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
doc.Pages.Accept(absorber);
// get the extracted text
string extractedText = absorber.Text;

◆ TextAbsorber() [2/4]

Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( System::SharedPtr< TextExtractionOptions extractionOptions)

Initializes a new instance of the TextAbsorber with extraction options.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

The example demonstrates how to extract text from all pages of the PDF document.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text with formatting
TextAbsorber absorber = new TextAbsorber(new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure));
// accept the absorber for all document's pages
doc.Pages.Accept(absorber);
// get the extracted text
string extractedText = absorber.Text;
Parameters
extractionOptionsText extraction options

◆ TextAbsorber() [3/4]

Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( System::SharedPtr< TextExtractionOptions extractionOptions,
System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions textSearchOptions 
)

Initializes a new instance of the TextAbsorber with extraction and text search options.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

Parameters
extractionOptionsText extraction options
textSearchOptionsText search options

◆ TextAbsorber() [4/4]

Aspose::Pdf::Text::TextAbsorber::TextAbsorber ( System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions textSearchOptions)

Initializes a new instance of the TextAbsorber with text search options.

Performs text extraction and provides access to the extracted text via TextAbsorber::Text object.

Parameters
textSearchOptionsText search options

Member Function Documentation

◆ get_Errors()

System::SharedPtr<System::Collections::Generic::List<System::SharedPtr<TextExtractionError> > > Aspose::Pdf::Text::TextAbsorber::get_Errors ( )

List of TextExtractionError objects. It contain information about errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.

◆ get_ExtractionOptions()

virtual System::SharedPtr<TextExtractionOptions> Aspose::Pdf::Text::TextAbsorber::get_ExtractionOptions ( )
virtual

Gets text extraction options.

Allows to define text formatting mode TextExtractionOptions during extraction. The default mode is TextExtractionOptions::TextFormattingMode::Pure

The example demonstrates how to set Pure text formatting mode and perform text extraction.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text with formatting
TextAbsorber absorber = new TextAbsorber();
// set pure text formatting mode
absorber.ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
// accept the absorber for all document's pages
doc.Pages.Accept(absorber);
// get the extracted text
string extractedText = absorber.Text;

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ get_HasErrors()

bool Aspose::Pdf::Text::TextAbsorber::get_HasErrors ( )

Value indicates whether errors were found during text extraction. Searching for errors will performed only if TextSearchOptions.LogTextExtractionErrors = true; And it may decrease performance.

◆ get_Text()

virtual System::String Aspose::Pdf::Text::TextAbsorber::get_Text ( )
virtual

Gets extracted text that the TextAbsorber extracts on the PDF document or page.

The example demonstrates how to extract text from all pages of the PDF document.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
doc.Pages.Accept(absorber);
// get the extracted text
string extractedText = absorber.Text;

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ get_TextSearchOptions()

virtual System::SharedPtr<Aspose::Pdf::Text::TextSearchOptions> Aspose::Pdf::Text::TextAbsorber::get_TextSearchOptions ( )
virtual

Gets text search options.

Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ GetTotalText()

System::String Aspose::Pdf::Text::TextAbsorber::GetTotalText ( System::SharedPtr< Aspose::Pdf::Engine::CommonData::Text::Segmenting::TextSegmenter >  segmenter,
bool  isFormatted 
)
protected

◆ set_ExtractionOptions()

virtual void Aspose::Pdf::Text::TextAbsorber::set_ExtractionOptions ( System::SharedPtr< TextExtractionOptions value)
virtual

Sets text extraction options.

Allows to define text formatting mode TextExtractionOptions during extraction. The default mode is TextExtractionOptions::TextFormattingMode::Pure

The example demonstrates how to set Pure text formatting mode and perform text extraction.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text with formatting
TextAbsorber absorber = new TextAbsorber();
// set pure text formatting mode
absorber.ExtractionOptions = new TextExtractionOptions(TextExtractionOptions.TextFormattingMode.Pure);
// accept the absorber for all document's pages
doc.Pages.Accept(absorber);
// get the extracted text
string extractedText = absorber.Text;

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ set_TextSearchOptions()

virtual void Aspose::Pdf::Text::TextAbsorber::set_TextSearchOptions ( System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions value)
virtual

Sets text search options.

Allows to define rectangle which delimits the extracted text. By default the rectangle is empty. That means page boundaries only defines the text extraction region.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ Visit() [1/3]

virtual void Aspose::Pdf::Text::TextAbsorber::Visit ( System::SharedPtr< Page page)
virtual

Extracts text on the specified page

The example demonstrates how to extract text on the first PDF document page.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.Visit(doc.Pages[1]);
// get the extracted text
string extractedText = absorber.Text;
Parameters
pagePdf pocument page object.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber, and Aspose::Pdf::Text::TextParagraphAbsorber.

◆ Visit() [2/3]

virtual void Aspose::Pdf::Text::TextAbsorber::Visit ( System::SharedPtr< XForm form)
virtual

Extracts text on the specified XForm.

The example demonstrates how to extract text on the first PDF document page.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.Visit(doc.Pages[1].Resources.Forms["Xform1"]);
// get the extracted text
string extractedText = absorber.Text;
Parameters
formPdf form object.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

◆ Visit() [3/3]

virtual void Aspose::Pdf::Text::TextAbsorber::Visit ( System::SharedPtr< Document pdf)
virtual

Extracts text on the specified document

The example demonstrates how to extract text on PDF document.

// open document
Document doc = new Document(inFile);
// create TextAbsorber object to extract text
TextAbsorber absorber = new TextAbsorber();
// accept the absorber for all document's pages
absorber.Visit(doc);
// get the extracted text
string extractedText = absorber.Text;
Parameters
pdfPdf pocument object.

Reimplemented in Aspose::Pdf::Text::TextFragmentAbsorber.

Member Data Documentation

◆ extractedText

System::SharedPtr<System::Text::StringBuilder> Aspose::Pdf::Text::TextAbsorber::extractedText
protected