com.aspose.ocr

Interface IRecognizedText



  • public interface IRecognizedText

    This interface is for work with recognized text. The result may be in multiple formats (plain text, array parts with details on each, in the hOCR format).


     
     OcrEngine ocr = new OcrEngine();
     ocr.getLanguageContainer().addLanguage(LanguageFactory.load("Portuguese-RSC-HS-PB-ResourcesAllCharsNet.zip")); // Resource file name
     ocr.setImage(ImageStream.fromFile("image.tiff"));
     if (ocr.process())
     {
     }
     for(IRecognizedPartInfo recognizedPartInfo : ocr.getText().getPartsInfo())
     {
          if (recognizedPartInfo instanceof IRecognizedTextPartInfo)
          {
              IRecognizedTextPartInfo recognizedBlockInfo = (IRecognizedTextPartInfo)recognizedPartInfo;
              String text = recognizedBlockInfo.getText();
              if (recognizedBlockInfo.Bold)
                  text = text;
              if (recognizedBlockInfo.Italic)
                  text = text;
              System.out.println(text);
          }
     }
     

    • Method Detail

      • getPartsInfo

        IRecognizedPartInfo[] getPartsInfo()
        Gets an array of recognized text by parts.
        Returns:
        Each part has its own style, font, text size, color, language and more. If the text consists of several parts that are written by different font (or a different language, etc.), then according to each will have an element in this array. It is divided by the words into parts, if have large text that has the same style. Parts are consistently followed, that they found in the original text so they are following here.
      • toString

        String toString()
        Gets whole recognized text without formatting. It is concatenation texts from parts.
        Overrides:
        toString in class Object
        Returns:
        Whole recognized text without formatting