Pervasive DataTools - Data Parsers

The Pervasive Data Parsers are the tools to use for parsing flat data files where there is only partial or no internal metadata. These flat files originate from mainframe (iSeries, AS/400, z/OS, z/VSE, CICS, HP 8000, and others), mini (DEC, HP 3000, Data General, Wang 2200, and others), and personal computers (MS-DOS, Windows, Linux, and UNIX) in obscure proprietary formats for which there is limited or no access. They often contain EBCDIC encoded data and require special handling to convert EBCDIC to ASCII.

Source Side

The Data Parsers support reading data from these semi- or un-structured flat files and provide a graphical interface in which you can manually define the record layout and field definitions. Alternately, if you have access to a dictionary file that contains the record layout and data definition, each Data Parser includes dictionary file support that is appropriate to the source data format. The Data Parser reads the external dictionary file - such as a COBOL copybook, ASCII record structure, Btrieve DDF, etc. - and auto-parses the data file into data records and data fields. Support for EBCDIC to ASCII conversion and ASCII to EBCDIC conversion, unicode and other international encodings (UTF-8, UTF-16, double-byte, multi-byte, etc.) is included in all of the Data Parsers.

Each of the Data Parsers can parse and convert fixed-length data files as follows...

Data Parser for Binary

The Data Parser for Binary can be used to parse and convert mainframe COBOL, RMCOBOL, VSCOBOL, ISAM, D-ISAM, VSAM flat files, BTREE and virtually any flat binary data file. The Data Parser for Binary supports dozens of data types including 16-bit binary, 24-bit binary, 32-bit binary, 64-bit binary, IEEE floating point, VAX floating point, Cray floating point, column binary, Comp, Comp-1, Comp-2, Comp-3, Comp-5, Comp-X, Display, Display Justified, Display Sign Leading, Display Sign Trailing, Packed Decimal, Pascal 48-bit real, Pascal string, Text, Zoned Decimal, Zoned-Leading Sign, and Zoned-Trailing Sign, and many others. Options for handling big endian, little endian, byte order, reverse byte order, and many other platform-specific data storage properties.

Data Parser for Btrieve

The Data Parser for Btrieve enables you to parse and convert many versions of Btrieve data files. The Data Parser for Btrieve supports dozens of Btrieve data types including 32-bit IEEE floating point, 64-bit IEEE floating point, Autoinc (2- & 4- byte), Bfloat (4- & 8-byte), Bit, Character, Comp, Comp-1, Comp-3, Comp-5, Comp-X, Date, Decimal, Float (4- & 8-byte), Integer (1-, 2-, 4-, & 8-byte), Logical, Logical (2 bytes), LString, LVar, Magic PC (Date, Extended, Number, Real, & Time), Microsoft BASIC Double, Microsoft BASIC Float, Money, Note, Numeric, NumericSA, NumericSTS, Packed Decimal, Sales Ally (Time-2, Date, Time-1, & Time), Unsigned (1-, 2-, 4-, & 8-byte), Zoned Decimal, and ZString.

Data Parser for C-ISAM

The Data Parser for C-ISAM is used to parse and convert C-ISAM data files. Support for the following data types is included: Character, Comp, Comp-1, Comp-2, Comp-3, Comp-5, Comp-X, Date, Decimal, Display, Display sign leading, Display sign leading separate, Display sign trailing, Display sign trailing separate, Float, Integer, Money, Null-terminated C string, Pascal string (1 byte), Pascal string (2 bytes), Serial, Smallfloat, Smallint, Zoned decimal, Zoned leading sign, and Zoned trailing sign.

Data Parser for C-TREE

Use the Data Parser for C-TREE to parse and convert C-TREE and C-TREE PLUS files. The Data Parser for C-TREE supports the following data types: 1-byte & 2-byte Lstring, 8-, 16-, 24-, 32-, & 64-bit binary, 32-bit IEEE floating-point, Character, Date, Display, Diaplay sign leading, Display sign leading separate, Display sign trailing, Display sign trailing separate, Float, Money, Packed decimal, Time, Zoned decimal, Zoned, leading sign, Zoned, trailing sign, and Zstring.

Data Parser for Fixed Text

For parsing fixed-length text files use the Data Parser for Fixed Text. The Data Parser for Fixed Text supports Character, Date, and Numeric data types.

Data Parser for Micro Focus COBOL

Use the Data Parser for Micro Focus COBOL to parse and convert MFCOBOL data files. The Data Parser for Micro Focus COBOL supports these data types: Comp, Comp-1, Comp-2, Comp-3, Comp-5, Comp-X, Display, Display Justified, Display Sign Leading, Display Sign Trailing, Packed Decimal, Text, Zoned Decimal, Zoned-Leading Sign, and Zoned-Trailing Sign.


Here are a few other names by which a variety of source data file formats are known: SDF (Standard Data Format), Fixed Length, Fixed Width, and Fixed Format.

The graphical tool gives you the power to completely view all the Source data in its native - including hexadecimal - and unpacked format, to help you read, translate and define packed, decimal, and integer data fields. Record filtering and sorting can be used to constrain the Source data sets.

Data Parser for Unstructured Text

Pervasive Data Parser for Unstructured Text is the tool that extracts and parses data from unstructured text and report files and converts the data to a structured data file format that can be imported into virtually any application, spreadsheet, or database.

This Data Parser enables you to: extract data from log files, extract data from report files, extract data from HTML files, extract data from XML files, extract data from print files, extract data from text reports, extract data from ASCII or EBCDIC text files, extract data from data reports, extract data from news retrieval, financial, or real estate downloads. The Data Parser for Unstructured Text supports virtually any report architecture as long as there are rules.

Here are just a few of the features of the Data Extractor: auto-parse tagged data fields, auto-parse columnar data, auto-parse mailing labels, support for large data fields and records, support for floating headers and floating footers, support for multi-line data fields, support for multi-record type reports, and much more!

The Problem

Reports are easily read by humans, but not by databases or data management systems. Much of the world's data is tied up in reports and unstructured text formats that cannot be consumed by most applications. The extraction of useful data from report files and print files usually requires a custom script...thus requiring the services of a programmer. Pervasive Data Extractors are the solution!

The Solution

The Pervasive Data Parser for Unstructured Text enables virtually anyone to create data extraction rules via a graphical interface and menu options. Simply define rules that identify the various lines and fields of useful data. Then, using the built-in rule de-bugger and data viewer, verify that the rules are working as needed.

Once your data displays correctly in the tabular row-and-column data browser, export the data to a delimited text file -- including unicode support -- that can be imported into most applications and databases. If further data cleansing, data manipulation, record filtering, or mapping is required, use one of the other Pervasive DataTools that fits your specific needs.

Check out the online demo and training video to see how the Data Parser for Unstructured Text can parse and extract data for you!

Read more...

Download the Data Parser for Unstructured Text Now...

Supported Source File Formats

A Pervasive Data Parser is available for each of the following source file formats:

Target Side

Each of the Pervasive Data Parsers exports the parsed and unpacked data to a CSV Text or Unicode CSV Text target data file format, including support for UTF-8, UTF-16, double-byte, multi-byte, and other international encodings.

A Similar DataTool

Another DataTool that "auto-parses" delimited ASCII text files is also available. Support for Unicode ASCII text is included.