How Data Extractor Works with Print Files

The Data Extractor has the ability to read complex text files of many kinds. The amount of computer data grows vastly each year, and much of it is provided in raw text formats.

Supported Formats

Some examples of the many source formats handled by the Data Extractor are:

  • Printouts from programs captured as disk files
  • Reports of any size or dimension
  • ASCII or any type of EBCDIC text files
  • Spooled print files
  • Fixed length sequential files
  • Complex multi-line files
  • Downloaded text files (e.g., news retrieval, financial, real estate...)
  • HTML and other structured documents
  • Internet text downloads
  • E-mail header and body
  • On-line textual databases
  • CD-ROM textbases
  • Files with tagged data fields
  • XML
  • HL7

Features

Using Data Extractor, you can extract the desired data fields from various lines in the text file, and assemble those fields into a flat record of data. Thus, whole records of structured data can be extracted and presented in the conventional row and column tabular format that you need to see before converting the data to a popular target format.

Some of the features that make the Data Extractor so complete are:

  • No practical limits on file size
  • Reads almost any kind of report architecture - as long as there are rules
  • Support for large fields and records
  • Handles floating headers, footers and details
  • Can automatically detect and propose recognition patterns
  • Handles tagged data fields
  • Autoparses columnar and tagged data
  • Powerful debugging tools
  • Structured data browser to see results prior to export
  • Command line automation
  • Built on extensible, extremely rich scripting language (CXL - Content Extraction Language)

The extraction of desired fields from the source text file is accomplished by visually marking up the file in the Data Extractor user interface. The mouse is employed to graphically select the desired fields from various lines displayed on the screen. Dialogs boxes on the screen allow you to express a rich set of pattern recognition rules and actions to assist in the extraction of clean data.

Several techniques are available to view samples of extracted data. Apart from scrolling the full text of the data, a debug window can be used to search for all lines satisfying certain extraction criteria.

Read more...

Download a Data Extractor Now...