Data Extraction Basics
The Data Extractor is a tool for extracting data that would otherwise be inaccessible.
Consider these scenarios:
- Your company is attempting to migrate several years’ worth of data from a legacy application. The data files for this application are stored in an unknown proprietary format, possibly with compressed or encrypted fields. Although the data cannot be accessed directly, your legacy application can generate reports.
- Your agency needs to merge data from several disparate sources into a single, easily accessible format. For example, you receive listings of real-estate properties from several different electronic sources that you want to combine into one standard listing format for your web site.
- One of your clients needs to extract specific data from many large log files and aggregate that data into a database for statistical analysis.
In each of these scenarios, the Data Extractor can extract valuable data from standard formated text files with lots of irrelevant information, such as headers and comments.
The Data Extractor exports the extracted data to CSV (Comma Separated Values) Text file format. If you want to convert the data to another format, or you want to manipulate the data further after you have extracted it, the Data Loaders can accomplish this. The Data Loaders support over 100 different file types, allowing you to convert your data to the vast majority of databases used throughout the world.
To Use Data Extractor
First you need to have a report file. Most applications on nearly every type of platform give you the option of creating and printing reports. Have the program print the report in a text only format, either ASCII or any standard EBCDIC code page. For more information, see How to Create a Report File.
- Start a new script in the Data Extractor and select the report file.
- Look at the report in the Data Extractor. Notice the overall pattern of the report when it repeats, the page layout, and the style used to organize information. Locate the data that you want to extract.
- Input the structural information. The Data Extractor needs patterns and structural rules to identify important data.
- Define line styles by marking which lines have important information and how they can be recognized.
- Define data fields by marking the data that you want collected, and where it can be found.
- Specify line actions. While you are defining line styles and data fields, select options that specify how you want the data to be assembled into records and fields. The default action is to collect the fields. You must find the end of the first record, or the beginning of the second, and change the action for that line to Accept Record. This stops the collection process for the first record and begins the collection process for the second, thus setting exactly which fields are included in the eventual output for that record. If you want to define more than one type of record in a single report file, you can do that by defining more than one Accept Record line style.
- Assign the fields to each record type, according to how you want the data to be exported.
- Browse your data. Once you have entered all the information Data Extractor needs to find your data, and specified how you want it structured, the Data Extractor automatically builds that structure internally. You can open the data browser and see it in a grid. If the fields or records are not structured the way you want them, go back and adjust the data field and/or line style definitions.
- Finally, save the script. By saving your script, you can use it again if you need to extract data from a report with the same style in the future.
Additional details about each of these steps are described in this documentation.
Your transaction secured by high-grade AES-256 encryption.