Class for extracting images and text from PDF document. More...

Inherits Aspose::Pdf::Facades::Facade.

Public Member Functions

int32_t get_StartPage ()
 Gets start page in the page range where extracting operation will be performed.
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 5;
ext.ExtractText();
More...
 
void set_StartPage (int32_t value)
 Sets start page in the page range where extracting operation will be performed.
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 5;
ext.ExtractText();
More...
 
int32_t get_EndPage ()
 Gets end page in the page range where extracting operation will be performed.
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 3;
ext.ExtractText();
More...
 
void set_EndPage (int32_t value)
 Sets end page in the page range where extracting operation will be performed.
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 3;
ext.ExtractText();
More...
 
int32_t get_ExtractTextMode ()
 Sets the mode for extract text's result. More...
 
void set_ExtractTextMode (int32_t value)
 Sets the mode for extract text's result. More...
 
System::SharedPtr< Aspose::Pdf::Text::TextSearchOptionsget_TextSearchOptions ()
 Gets text search options. More...
 
void set_TextSearchOptions (System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions > value)
 Sets text search options. More...
 
Aspose::Pdf::ExtractImageMode get_ExtractImageMode ()
 Sets the mode for extract images process. More...
 
void set_ExtractImageMode (Aspose::Pdf::ExtractImageMode value)
 Sets the mode for extract images process. More...
 
bool get_IsBidi ()
 Is true when text has hebriew or arabic symbols. This case must be specially considered because string functions change their behaviour and start process text from right to left (except numbers and other non text chars). More...
 
int32_t get_Resolution ()
 Set or gets resolution for extracted images. Default value is 150. Images which have greater resolution value are more clear. However increasing resolution value results in increasing time and memory needed to extract images. Usually to get clear image it's enough to set resolution to 150 or 300. More...
 
void set_Resolution (int32_t value)
 Set or gets resolution for extracted images. Default value is 150. Images which have greater resolution value are more clear. However increasing resolution value results in increasing time and memory needed to extract images. Usually to get clear image it's enough to set resolution to 150 or 300. More...
 
System::String get_Password ()
 Gets input file's password. More...
 
void set_Password (System::String value)
 Sets input file's password. More...
 
void ExtractText ()
 Extracts text from a Pdf document using Unicode encoding. More...
 
void ExtractText (System::SharedPtr< System::Text::Encoding > encoding)
 Extracts text from a Pdf document using specified encoding. More...
 
void GetText (System::String outputFile)
 Saves text to file. see also:ExtractText More...
 
void GetText (System::SharedPtr< System::IO::Stream > outputStream)
 Saves text to stream. see also:ExtractText More...
 
virtual void BindPdf (System::String inputFile)
 Bind input PDF file. More...
 
virtual void BindPdf (System::SharedPtr< System::IO::Stream > inputStream)
 Binds PDF document from stream. More...
 
void ExtractImage ()
 Extract images from PDF file. More...
 
bool HasNextImage ()
 Checks if more images are accessible in PDF document. Note: ExtractImage must be called before using of this method. More...
 
bool GetNextImage (System::String outputFile)
 Retreives next image from PDF document. Note: ExtractImage must be called before using of this method. More...
 
bool GetNextImage (System::String outputFile, System::SharedPtr< System::Drawing::Imaging::ImageFormat > format)
 Retreives next image from PDF document with given image format. Note: ExtractImage must be called before using of this method. More...
 
bool GetNextImage (System::SharedPtr< System::IO::Stream > outputStream, System::SharedPtr< System::Drawing::Imaging::ImageFormat > format)
 Retreive next image from PDF file and stores it into stream with given image format. More...
 
bool GetNextImage (System::SharedPtr< System::IO::Stream > outputStream)
 Retreive next image from PDF file and stores it into stream. More...
 
System::SharedPtr< System::Collections::Generic::IList< System::String > > GetAttachNames ()
 Returns list of attachments in PDF file. Note: ExtractAttachments must be called befor using this method. More...
 
void ExtractAttachment ()
 Extracts attachments from a Pdf document. More...
 
void ExtractAttachment (System::String attachmentFileName)
 Extracts attachment to PDF file by attachment name. More...
 
void GetAttachment (System::String outputPath)
 Stores attachment into file. More...
 
bool HasNextPageText ()
 Indicates that whether can get more texts or not. More...
 
void GetNextPageText (System::String outputFile)
 Saves one page's text to file. More...
 
void GetNextPageText (System::SharedPtr< System::IO::Stream > outputStream)
 Saves one page's text to stream. More...
 
 PdfExtractor ()
 Initializes new PdfExtractor object. More...
 
 PdfExtractor (System::SharedPtr< Aspose::Pdf::Document > document)
 Initializes new PdfExtractor object on base of the document . More...
 
void GetText (System::SharedPtr< System::IO::Stream > outputStream, bool filterNotAscii)
 Saves text to stream. see also:ExtractText More...
 
System::ArrayPtr< System::SharedPtr< System::IO::MemoryStream > > GetAttachment ()
 Saves all the attachment file to streams. More...
 
System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< FileSpecification > > > GetAttachmentInfo ()
 Gets the list of attachments. More...
 
- Public Member Functions inherited from Aspose::Pdf::Facades::Facade
System::SharedPtr< Aspose::Pdf::Documentget_Document () const
 Gets the document facade is working on. More...
 
virtual void BindPdf (System::SharedPtr< Aspose::Pdf::Document > srcDoc)
 Initializes the facade. More...
 
virtual void Close ()
 Disposes Aspose.Pdf.Document bound with a facade. More...
 
void Dispose ()
 Disposes the facade. More...
 
- Public Member Functions inherited from Aspose::Pdf::Facades::IFacade
virtual void BindPdf (System::SharedPtr< Document > srcDoc)=0
 Binds PDF document for editing. More...
 
- Public Member Functions inherited from System::Object
ASPOSECPP_SHARED_API Object ()
 Creates object. Initializes all internal data structures. More...
 
virtual ASPOSECPP_SHARED_API ~Object ()
 Destroys object. Frees all internal data structures. More...
 
ASPOSECPP_SHARED_API Object (Object const &x)
 Copy constructor. Doesn't copy anything, really, just initializes new object and enables copy constructing subclasses. More...
 
Objectoperator= (Object const &x)
 Assignment operator. Doesn't copy anything, really, just initializes new object and enables copy constructing subclasses. More...
 
ObjectSharedRefAdded ()
 Increments shared reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
int SharedRefRemovedSafe ()
 Decrements and returns shared reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
int RemovedSharedRefs (int count)
 Decreases shared reference count by specified value. More...
 
Detail::SmartPtrCounter * WeakRefAdded ()
 Increments weak reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
void WeakRefRemoved ()
 Decrements weak reference count. Shouldn't be called directly; instead, use smart pointers or ThisProtector. More...
 
Detail::SmartPtrCounter * GetCounter ()
 Gets reference counter data structure associated with the object. More...
 
int SharedCount () const
 Gets current value of shared refernce counter. More...
 
ASPOSECPP_SHARED_API void Lock ()
 Implements C# lock() statement locking. Call directly or use LockContext sentry object. More...
 
ASPOSECPP_SHARED_API void Unlock ()
 Implements C# lock() statement unlocking. Call directly or use LockContext sentry object. More...
 
virtual ASPOSECPP_SHARED_API bool Equals (ptr obj)
 Compares objects using C# Object.Equals semantics. More...
 
virtual ASPOSECPP_SHARED_API int GetHashCode () const
 Analog of C# Object.GetHashCode() method. Enables hashing of custom objects. More...
 
virtual ASPOSECPP_SHARED_API String ToString () ASPOSE_CONST
 Analog of C# Object.ToString() method. Enables converting custom objects to string. More...
 
virtual ASPOSECPP_SHARED_API ptr MemberwiseClone () const
 Analog of C# Object.MemberwiseClone() method. Enables cloning custom types. More...
 
virtual ASPOSECPP_SHARED_API const TypeInfoGetType () const
 Gets actual type of object. Analog of C# System.Object.GetType() call. More...
 
virtual ASPOSECPP_SHARED_API bool Is (const TypeInfo &targetType) const
 Check if object represents an instance of type described by targetType. Analog of C# 'is' operator. More...
 
virtual ASPOSECPP_SHARED_API void SetTemplateWeakPtr (unsigned int argument)
 Set n'th template argument a weak pointer (rather than shared). Allows switching pointers in containers to weak mode. More...
 
template<>
bool Equals (float const &objA, float const &objB)
 
template<>
bool Equals (double const &objA, double const &objB)
 
template<>
bool ReferenceEquals (String const &str, std::nullptr_t)
 
template<>
bool ReferenceEquals (String const &str1, String const &str2)
 

Protected Member Functions

bool get__IsObjectLicensed ()
 Gets licensed state of the system. Returns true is system works in licensed mode and false otherwise. More...
 
void SetVentureLicense (System::SharedPtr< Aspose::Pdf::LicenseManagement::VentureLicense > license)
 
System::SharedPtr< Aspose::Pdf::LicenseManagement::VentureLicense > GetVentureLicense ()
 
void InitPageImages (System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>> images, System::SharedPtr< Aspose::Pdf::Document > document, int32_t page, int32_t endPage)
 
void InitPageXFormImages_DefinedInResources (System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>> images, System::SharedPtr< Aspose::Pdf::Document > document, int32_t page, int32_t endPage)
 
- Protected Member Functions inherited from Aspose::Pdf::Facades::Facade
 Facade ()
 The constructor. More...
 
 Facade (System::SharedPtr< Aspose::Pdf::Document > srcDoc)
 The constructor. More...
 
virtual void BindPdf (System::String srcFile, System::String password)
 Initializes the facade. More...
 
virtual void BindPdf (System::SharedPtr< System::IO::Stream > srcStream, System::String password)
 Initializes the facade. More...
 
virtual void AssertDocument () const
 Asserts if the facade is initialized. More...
 

Static Protected Member Functions

static void InitXFormImages (System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>> images, System::SharedPtr< XForm > xform)
 
static void InitPageXFormImages_ActuallyUsed (System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>> images, System::SharedPtr< Aspose::Pdf::Document > document, int32_t page, int32_t endPage)
 

Additional Inherited Members

- Public Types inherited from System::Object
typedef SmartPtr< Objectptr
 Alias for smart pointer type. More...
 
typedef System::Details::SharedMembersType shared_members_type
 structure to keep list of shared pointers contained in object. More...
 
- Static Public Member Functions inherited from System::Object
static bool ReferenceEquals (ptr const &objA, ptr const &objB)
 Compares objects by reference. More...
 
template<typename T >
static std::enable_if<!IsSmartPtr< T >::value, bool >::type ReferenceEquals (T const &objA, T const &objB)
 Compares objects by reference. More...
 
template<typename T >
static std::enable_if<!IsSmartPtr< T >::value, bool >::type ReferenceEquals (T const &objA, std::nullptr_t)
 Reference-compares value type object with nullptr. More...
 
template<typename T1 , typename T2 >
static std::enable_if< IsSmartPtr< T1 >::value &&IsSmartPtr< T2 >::value, bool >::type Equals (T1 const &objA, T2 const &objB)
 Compares reference type objects in C# style. More...
 
template<typename T1 , typename T2 >
static std::enable_if<!IsSmartPtr< T1 >::value &&!IsSmartPtr< T2 >::value, bool >::type Equals (T1 const &objA, T2 const &objB)
 Compares value type objects in C# style. More...
 
static const TypeInfoType ()
 Impleemnts C# typeof(System.Object) construct. More...
 

Detailed Description

Class for extracting images and text from PDF document.

Constructor & Destructor Documentation

◆ PdfExtractor() [1/2]

Aspose::Pdf::Facades::PdfExtractor::PdfExtractor ( )

Initializes new PdfExtractor object.

◆ PdfExtractor() [2/2]

Aspose::Pdf::Facades::PdfExtractor::PdfExtractor ( System::SharedPtr< Aspose::Pdf::Document document)

Initializes new PdfExtractor object on base of the document .

Parameters
documentPdf document.

Member Function Documentation

◆ BindPdf() [1/2]

virtual void Aspose::Pdf::Facades::PdfExtractor::BindPdf ( System::String  inputFile)
virtual

Bind input PDF file.

Parameters
inputFilePDF fiel to bind
ext.BindPdf("sample.pdf");

Reimplemented from Aspose::Pdf::Facades::Facade.

◆ BindPdf() [2/2]

virtual void Aspose::Pdf::Facades::PdfExtractor::BindPdf ( System::SharedPtr< System::IO::Stream inputStream)
virtual

Binds PDF document from stream.

Parameters
inputStreamStream containing PDF document data
Stream stream = new FileStream("sample.pdf", FileMode.Open, FileAccess.Read);
ext.BindPdf(stream);

Reimplemented from Aspose::Pdf::Facades::Facade.

◆ ExtractAttachment() [1/2]

void Aspose::Pdf::Facades::PdfExtractor::ExtractAttachment ( )

Extracts attachments from a Pdf document.

◆ ExtractAttachment() [2/2]

void Aspose::Pdf::Facades::PdfExtractor::ExtractAttachment ( System::String  attachmentFileName)

Extracts attachment to PDF file by attachment name.

Parameters
attachmentFileNameName of attachment to extract

◆ ExtractImage()

void Aspose::Pdf::Facades::PdfExtractor::ExtractImage ( )

Extract images from PDF file.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf("sample.pdf");
extractor.ExtractImage();
int i = 1;
while (extractor.HasNextImage())
{
extractor.GetNextImage("image-" + i +".pdf");
}

◆ ExtractText() [1/2]

void Aspose::Pdf::Facades::PdfExtractor::ExtractText ( )

Extracts text from a Pdf document using Unicode encoding.

First example demonstratres how to extract all the text from PDF file.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(@"D:\Text\text.pdf");
extractor.ExtractText();
extractor.GetText(@"D:\Text\text.txt");
Dim extractor As PdfExtractor = New PdfExtractor()
extractor.BindPdf("D:\Text\text.pdf")
extractor.ExtractText()
extractor.GetText("D:\Text\text.txt")

Second example demonstratres how to extract each page's text into one txt file.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(TestPath + @"Aspose.Pdf.Kit.Pdf");
extractor.ExtractText();
String prefix = TestPath + @"Aspose.Pdf.Kit";
String suffix = ".txt";
int pageCount = 1;
while (extractor.HasNextPageText())
{
extractor.GetNextPageText(prefix + pageCount + suffix);
pageCount++;
}
Dim extractor As PdfExtractor = New PdfExtractor()
extractor.BindPdf(TestPath + "Aspose.Pdf.Kit.Pdf")
extractor.ExtractText()
Dim prefix As String = TestPath + "Aspose.Pdf.Kit"
Dim suffix As String = ".txt"
Dim pageCount As Integer = 1
While extractor.HasNextPageText()
extractor.GetNextPageText(prefix + pageCount + suffix)
pageCount = pageCount + 1
End While

◆ ExtractText() [2/2]

void Aspose::Pdf::Facades::PdfExtractor::ExtractText ( System::SharedPtr< System::Text::Encoding encoding)

Extracts text from a Pdf document using specified encoding.

First example demonstratres how to extract all the text from PDF file.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(@"D:\Text\text.pdf");
extractor.ExtractText(Encoding.Unicode);
extractor.GetText(@"D:\Text\text.txt");
Dim extractor As PdfExtractor = New PdfExtractor()
extractor.BindPdf("D:\Text\text.pdf")
extractor.ExtractText(Encoding.Unicode)
extractor.GetText("D:\Text\text.txt")

Second example demonstratres how to extract each page's text into one txt file.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(TestPath + @"Aspose.Pdf.Kit.Pdf");
extractor.ExtractText(Encoding.Unicode);
String prefix = TestPath + @"Aspose.Pdf.Kit";
String suffix = ".txt";
int pageCount = 1;
while (extractor.HasNextPageText())
{
extractor.GetNextPageText(prefix + pageCount + suffix);
pageCount++;
}
Dim extractor As PdfExtractor = New PdfExtractor()
extractor.BindPdf(TestPath + "Aspose.Pdf.Kit.Pdf")
extractor.ExtractText(Encoding.Unicode)
Dim prefix As String = TestPath + "Aspose.Pdf.Kit"
Dim suffix As String = ".txt"
Dim pageCount As Integer = 1
While extractor.HasNextPageText()
extractor.GetNextPageText(prefix + pageCount + suffix)
pageCount = pageCount + 1
End While
Parameters
encodingThe encoding of the extracted text.

◆ get__IsObjectLicensed()

bool Aspose::Pdf::Facades::PdfExtractor::get__IsObjectLicensed ( )
protected

Gets licensed state of the system. Returns true is system works in licensed mode and false otherwise.

◆ get_EndPage()

int32_t Aspose::Pdf::Facades::PdfExtractor::get_EndPage ( )

Gets end page in the page range where extracting operation will be performed.

PdfExtractor ext = new PdfExtractor();
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 3;
ext.ExtractText();

◆ get_ExtractImageMode()

Aspose::Pdf::ExtractImageMode Aspose::Pdf::Facades::PdfExtractor::get_ExtractImageMode ( )

Sets the mode for extract images process.

Default value is ExtractImageMode.DefinedInResources that extracts all images defined in resources.

To extract actually shown images ExtractImageMode.ActuallyUsed mode should be used.

◆ get_ExtractTextMode()

int32_t Aspose::Pdf::Facades::PdfExtractor::get_ExtractTextMode ( )

Sets the mode for extract text's result.

0 is pure text mode and 1 is raw ordering mode. Default is 0.

The example demonstratres the ExtractTextMode property usage in text extraction scenario.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(@"D:\Text\text.pdf");
extractor.ExtractTextMode = 1;
extractor.ExtractText();
extractor.GetText(@"D:\Text\text.txt");

◆ get_IsBidi()

bool Aspose::Pdf::Facades::PdfExtractor::get_IsBidi ( )

Is true when text has hebriew or arabic symbols. This case must be specially considered because string functions change their behaviour and start process text from right to left (except numbers and other non text chars).

◆ get_Password()

System::String Aspose::Pdf::Facades::PdfExtractor::get_Password ( )

Gets input file's password.

◆ get_Resolution()

int32_t Aspose::Pdf::Facades::PdfExtractor::get_Resolution ( )

Set or gets resolution for extracted images. Default value is 150. Images which have greater resolution value are more clear. However increasing resolution value results in increasing time and memory needed to extract images. Usually to get clear image it's enough to set resolution to 150 or 300.

◆ get_StartPage()

int32_t Aspose::Pdf::Facades::PdfExtractor::get_StartPage ( )

Gets start page in the page range where extracting operation will be performed.

PdfExtractor ext = new PdfExtractor();
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 5;
ext.ExtractText();

◆ get_TextSearchOptions()

System::SharedPtr<Aspose::Pdf::Text::TextSearchOptions> Aspose::Pdf::Facades::PdfExtractor::get_TextSearchOptions ( )

Gets text search options.

◆ GetAttachment() [1/2]

void Aspose::Pdf::Facades::PdfExtractor::GetAttachment ( System::String  outputPath)

Stores attachment into file.

Parameters
outputPathDirectory path where attachment(s) will be stored. Null or empty string means attachment(s) will be placed in the application directory.

◆ GetAttachment() [2/2]

System::ArrayPtr<System::SharedPtr<System::IO::MemoryStream> > Aspose::Pdf::Facades::PdfExtractor::GetAttachment ( )

Saves all the attachment file to streams.

Returns
The stream array of the attachment file in the pdf document.
[C#]
PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(path + "Attach.pdf");
extractor.ExtractAttachment();
IList names = extractor.GetAttachNames();
MemoryStream[] tempStreams = extractor.GetAttachment();
for (int i=0; i<tempStreams.Length; i++)
{
string name = (string)names[i];
FileStream fs = new FileStream(path + name,System.IO.FileMode.Create);
byte[] tempBytes = new byte[4096];
tempStreams[i].Position = 0;
for (; ; )
{
int numOfBytes = tempStreams[i].Read(tempBytes, 0, 4096);
if (numOfBytes < 1)
break;
fs.Write(tempBytes, 0, numOfBytes);
}
fs.Close();
}
[Visual Basic]
Dim extractor As PdfExtractor = New PdfExtractor()
extractor.BindPdf(path + "Attach.pdf")
extractor.ExtractAttachment()
extractor.GetAttachment(path)
Dim names As IList = extractor.GetAttachNames()
Dim tempStreams() As MemoryStream = extractor.GetAttachment()
for(Integer i = 0 i<tempStreams.Length i++)
{
Dim name As String = CType(names(i), String)
Dim fs As FileStream = New FileStream(path + name,System.IO.FileMode.Create)
Dim tempBytes() As Byte = New Byte(4096) {}
tempStreams(i).Position = 0
for()
{
Dim numOfBytes As Integer = tempStreams(i).Read(tempBytes,0,4096)
If numOfBytes < 1 Then
break
End If
fs.Write(tempBytes, 0, numOfBytes)
}
fs.Close()
}

◆ GetAttachmentInfo()

System::SharedPtr<System::Collections::Generic::List<System::SharedPtr<FileSpecification> > > Aspose::Pdf::Facades::PdfExtractor::GetAttachmentInfo ( )

Gets the list of attachments.

Returns
Returns a List<FileSpecificatio>>.

◆ GetAttachNames()

System::SharedPtr<System::Collections::Generic::IList<System::String> > Aspose::Pdf::Facades::PdfExtractor::GetAttachNames ( )

Returns list of attachments in PDF file. Note: ExtractAttachments must be called befor using this method.

Returns
List of attachments

Example demonstrates how to extract attachment names form PDF file.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(TestSettings.GetInputFile("sample.pdf"));
extractor.ExtractAttachment();
IList attachments = extractor.GetAttachNames();
foreach (string name in attachments)
Console.WriteLine(name);

◆ GetNextImage() [1/4]

bool Aspose::Pdf::Facades::PdfExtractor::GetNextImage ( System::String  outputFile)

Retreives next image from PDF document. Note: ExtractImage must be called before using of this method.

Parameters
outputFileFile where image will be stored
Returns
True is image is successfully extracted
PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf("sample.pdf");
extractor.ExtractImage();
int i = 1;
while (extractor.HasNextImage())
{
extractor.GetNextImage("image-" + i +".pdf");
}

◆ GetNextImage() [2/4]

bool Aspose::Pdf::Facades::PdfExtractor::GetNextImage ( System::String  outputFile,
System::SharedPtr< System::Drawing::Imaging::ImageFormat format 
)

Retreives next image from PDF document with given image format. Note: ExtractImage must be called before using of this method.

Parameters
outputFileFile where image will be stored
formatThe format of the image.
Returns
True is image is successfully extracted

◆ GetNextImage() [3/4]

bool Aspose::Pdf::Facades::PdfExtractor::GetNextImage ( System::SharedPtr< System::IO::Stream outputStream,
System::SharedPtr< System::Drawing::Imaging::ImageFormat format 
)

Retreive next image from PDF file and stores it into stream with given image format.

Parameters
outputStreamStream where image data will be saved
formatThe format of the image.
Returns
True in case the image is successfully extracted.

◆ GetNextImage() [4/4]

bool Aspose::Pdf::Facades::PdfExtractor::GetNextImage ( System::SharedPtr< System::IO::Stream outputStream)

Retreive next image from PDF file and stores it into stream.

Parameters
outputStreamStream where image data will be saved
Returns
True in case the image is successfully extracted.

◆ GetNextPageText() [1/2]

void Aspose::Pdf::Facades::PdfExtractor::GetNextPageText ( System::String  outputFile)

Saves one page's text to file.

The example demonstratres the GetNextPageText method usage in text extraction scenario.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(TestPath + @"Aspose.Pdf.Kit.Pdf");
extractor.ExtractText(Encoding.Unicode);
String prefix = TestPath + @"Aspose.Pdf.Kit";
String suffix = ".txt";
int pageCount = 1;
while (extractor.HasNextPageText())
{
extractor.GetNextPageText(prefix + pageCount + suffix);
pageCount++;
}
Dim extractor As PdfExtractor = New PdfExtractor()
extractor.BindPdf(TestPath + "Aspose.Pdf.Kit.Pdf")
extractor.ExtractText(Encoding.Unicode)
Dim prefix As String = TestPath + "Aspose.Pdf.Kit"
Dim suffix As String = ".txt"
Dim pageCount As Integer = 1
While extractor.HasNextPageText()
extractor.GetNextPageText(prefix + pageCount + suffix)
pageCount = pageCount + 1
End While
Parameters
outputFileThe file path and name to save the text.

◆ GetNextPageText() [2/2]

void Aspose::Pdf::Facades::PdfExtractor::GetNextPageText ( System::SharedPtr< System::IO::Stream outputStream)

Saves one page's text to stream.

The example demonstratres the GetNextPageText method usage in text extraction scenario.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(TestPath + @"Aspose.Pdf.Kit.Pdf");
extractor.ExtractText(Encoding.Unicode);
String prefix = TestPath + @"Aspose.Pdf.Kit";
String suffix = ".txt";
int pageCount = 1;
while (extractor.HasNextPageText())
{
FileStream fs = new FileStream(prefix + pageCount + suffix, FileMode.Create);
extractor.GetNextPageText(prefix + pageCount + suffix);
fs.Close();
pageCount++;
}
Parameters
outputStreamThe stream to save the text.

◆ GetText() [1/3]

void Aspose::Pdf::Facades::PdfExtractor::GetText ( System::String  outputFile)

Saves text to file. see also:ExtractText

Parameters
outputFileThe file path and name to save the text.

◆ GetText() [2/3]

void Aspose::Pdf::Facades::PdfExtractor::GetText ( System::SharedPtr< System::IO::Stream outputStream)

Saves text to stream. see also:ExtractText

Parameters
outputStreamThe stream to save the text.

◆ GetText() [3/3]

void Aspose::Pdf::Facades::PdfExtractor::GetText ( System::SharedPtr< System::IO::Stream outputStream,
bool  filterNotAscii 
)

Saves text to stream. see also:ExtractText

Parameters
outputStreamThe stream to save the text.
filterNotAsciiIf this parameter is true all Not ASCII simbols will be removed

◆ GetVentureLicense()

System::SharedPtr<Aspose::Pdf::LicenseManagement::VentureLicense> Aspose::Pdf::Facades::PdfExtractor::GetVentureLicense ( )
protected

◆ HasNextImage()

bool Aspose::Pdf::Facades::PdfExtractor::HasNextImage ( )

Checks if more images are accessible in PDF document. Note: ExtractImage must be called before using of this method.

Returns
Trues if more images are accessible
PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf("sample.pdf");
extractor.ExtractImage();
int i = 1;
while (extractor.HasNextImage())
{
extractor.GetNextImage("image-" + i +".pdf");
}

◆ HasNextPageText()

bool Aspose::Pdf::Facades::PdfExtractor::HasNextPageText ( )

Indicates that whether can get more texts or not.

The example demonstratres the HasNextPageText property usage in text extraction scenario.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(TestPath + @"Aspose.Pdf.Kit.Pdf");
extractor.ExtractText(Encoding.Unicode);
String prefix = TestPath + @"Aspose.Pdf.Kit";
String suffix = ".txt";
int pageCount = 1;
while (extractor.HasNextPageText())
{
extractor.GetNextPageText(prefix + pageCount + suffix);
pageCount++;
}
Dim extractor As PdfExtractor = New PdfExtractor()
extractor.BindPdf(TestPath + "Aspose.Pdf.Kit.Pdf")
extractor.ExtractText(Encoding.Unicode)
Dim prefix As String = TestPath + "Aspose.Pdf.Kit"
Dim suffix As String = ".txt"
Dim pageCount As Integer = 1
While extractor.HasNextPageText()
extractor.GetNextPageText(prefix + pageCount + suffix)
pageCount = pageCount + 1
End While
Returns
Can get more texts or not, true is can, or false.

◆ InitPageImages()

void Aspose::Pdf::Facades::PdfExtractor::InitPageImages ( System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>>  images,
System::SharedPtr< Aspose::Pdf::Document document,
int32_t  page,
int32_t  endPage 
)
protected

◆ InitPageXFormImages_ActuallyUsed()

static void Aspose::Pdf::Facades::PdfExtractor::InitPageXFormImages_ActuallyUsed ( System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>>  images,
System::SharedPtr< Aspose::Pdf::Document document,
int32_t  page,
int32_t  endPage 
)
staticprotected

◆ InitPageXFormImages_DefinedInResources()

void Aspose::Pdf::Facades::PdfExtractor::InitPageXFormImages_DefinedInResources ( System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>>  images,
System::SharedPtr< Aspose::Pdf::Document document,
int32_t  page,
int32_t  endPage 
)
protected

◆ InitXFormImages()

static void Aspose::Pdf::Facades::PdfExtractor::InitXFormImages ( System::SharedPtr< System::Collections::Generic::List< System::SharedPtr< XImage >>>  images,
System::SharedPtr< XForm xform 
)
staticprotected

◆ set_EndPage()

void Aspose::Pdf::Facades::PdfExtractor::set_EndPage ( int32_t  value)

Sets end page in the page range where extracting operation will be performed.

PdfExtractor ext = new PdfExtractor();
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 3;
ext.ExtractText();

◆ set_ExtractImageMode()

void Aspose::Pdf::Facades::PdfExtractor::set_ExtractImageMode ( Aspose::Pdf::ExtractImageMode  value)

Sets the mode for extract images process.

Default value is ExtractImageMode.DefinedInResources that extracts all images defined in resources.

To extract actually shown images ExtractImageMode.ActuallyUsed mode should be used.

◆ set_ExtractTextMode()

void Aspose::Pdf::Facades::PdfExtractor::set_ExtractTextMode ( int32_t  value)

Sets the mode for extract text's result.

0 is pure text mode and 1 is raw ordering mode. Default is 0.

The example demonstratres the ExtractTextMode property usage in text extraction scenario.

PdfExtractor extractor = new PdfExtractor();
extractor.BindPdf(@"D:\Text\text.pdf");
extractor.ExtractTextMode = 1;
extractor.ExtractText();
extractor.GetText(@"D:\Text\text.txt");

◆ set_Password()

void Aspose::Pdf::Facades::PdfExtractor::set_Password ( System::String  value)

Sets input file's password.

◆ set_Resolution()

void Aspose::Pdf::Facades::PdfExtractor::set_Resolution ( int32_t  value)

Set or gets resolution for extracted images. Default value is 150. Images which have greater resolution value are more clear. However increasing resolution value results in increasing time and memory needed to extract images. Usually to get clear image it's enough to set resolution to 150 or 300.

◆ set_StartPage()

void Aspose::Pdf::Facades::PdfExtractor::set_StartPage ( int32_t  value)

Sets start page in the page range where extracting operation will be performed.

PdfExtractor ext = new PdfExtractor();
ext.BindBdf("sample.pdf");
ext.StartPage = 2;
ext.EndPage = 5;
ext.ExtractText();

◆ set_TextSearchOptions()

void Aspose::Pdf::Facades::PdfExtractor::set_TextSearchOptions ( System::SharedPtr< Aspose::Pdf::Text::TextSearchOptions value)

Sets text search options.

◆ SetVentureLicense()

void Aspose::Pdf::Facades::PdfExtractor::SetVentureLicense ( System::SharedPtr< Aspose::Pdf::LicenseManagement::VentureLicense >  license)
protected