Pervasive Software Inc.
12365 Riata Trace Parkway
Building B
Austin, TX 78727 USA
Web: http://pervasivedatatools.com
These manuals lead you through the operation of the various Pervasive DataTools user interfaces. The DataTools are a set of subscription-based, licensed software products designed to assist you with a variety of data management, data conversion, data migration, and data transformation tasks. You must have a non-expired DataTools product license to run each of the applications.
Refer to the license.txt file in the default installation directory for disclaimers and information about trademarks and credits.
The help topics are organized by the DataTools product categories; i.e., Loaders, Parsers, Extractors, etc. So whether you've already purchased a subscription or are still trying to determine how the tools work, peruse the help topics for any or all of the DataTools.
Each of the links below will take you to the Table of Contents for that product's complete online help.
Pervasive Software Inc.
12365 Riata Trace Parkway
Building B
Austin, TX 78727 USA
Web: http://pervasivedatatools.com
This manual leads you through the operation of the DataTools Launcher user interface. The Pervasive DataTools Launcher allows you to:
Refer to the license.txt file in the default installation directory for disclaimers and information about trademarks and credits.
DataTools Launcher is an application used to manage your Pervasive DataTools product licenses and to launch the variety of products available within the Pervasive DataTools suite.
Each DataTools license is a subscription to a specific product with a specific set of features.
DataTools Launcher consists of two tabs: one for managing active product licenses and one for viewing inactive licenses. Inactive licenses are either licenses that have expired or licenses that have been super-ceded. When you extend a product license, you receive a new license for the same product that has a new expiration date. This more recent, extended license supercedes the old one, and the old license file is archived and is moved to the Inactive tab.
DataTools Launcher looks for licenses within the Active License Directory. The default Active License Directory is located under the Windows shared application data folder. On Windows XP, the location is: C:\Documents and Settings\AllUsers\Application Data\Pervasive\DataTools\Common800\License. On Windows Vista (32-bit), the location is C:\ProgramData\Pervasive\DataTools\Common800\License. You can change the Active License Directory in the Preferences menu.
After installation and at first startup, the DataTools Launcher will prompt you to enter your user ID and password. Enter the same login credentials you used to order your license(s). The DataTools Launcher will then connect to the Pervasive license web service and download any new licenses or licenses missing from your Active License Directory. NOTE: It is important that you use the same user ID and password used to order your Pervasive DataTools licenses from the DataTools e-commerce web site.
After a successful log into the Pervasive web service, you will be notified if new or missing licenses exist. See below:
Check the box in the Add cell next to the license you would like to copy to your Active License Directory. After selecting the licenses you wish to add to your Active License Directory, press the OK button. The licenses will be added and the Active tab will display.
Under the preference menu, you have the ability to change the Active License Directory, the web service connection information, and the display font.
Changing the connection information includes changing the login information (User ID and Password) and changing some connection default behavior. Every time the application starts up, it will log into the Pervasive license web service to look for new or missing licenses. If you wish to turn this off, you can uncheck that preference. You can then select the Refresh command any time you want to look for new or missing licenses. You can also select to automatically download new or missing licenses any time they are found. The default is set to tell you how many were found and allow you to view and manually download the ones you want. Check the preference box to automatically download these if you want that behavior.
For each task, we have provided a quick link to the section of the DataTools Launcher documentation that describes how to perform the specific task.
| Task | Help Topic |
|---|---|
| Find your DataTools licenses | Go To Active Tab > Find |
| View licensed features of an Active license |
Go To Active Tab > View |
| Purchase additional licenses or extend an existing license |
Go To Active Tab > Purchase |
| Refresh the active license list | Go To Active Tab > Refresh |
| Launch one of the DataTools products |
Go To Active Tab > Launch |
| View licensed features of an Inactive license |
Go To Inactive Tab > View |
| Uninstall DataTools | Go To Uninstall DataTools |
The Active tab displays a list of the Pervasive DataTools licenses in your Active License Directory. DataTools licenses are files that have an .SLC file extension. First time users will be prompted for login information so that DataTools Launcher can download your active licenses from the Pervasive license web service. If the list is empty, it means no licenses exist under the login credentials provided. Use the Find command to locate and copy your license files into the Active License Directory if you have license files in other file storage locations.
To purchase additional licenses, use the Purchase command to access the e-commerce website where you can buy additional licenses or extend the subscription for a license that is about to expire.
The display grid on this tab and the Inactive tab shows some identifying information about each license.
Use the View command to view the license feature details.
All licenses that have expired will be automatically archived and display on the Expired tab. Licenses that are two weeks (or less) from expiration have a red expiration date.
Only one license within the display grid can be selected for further operations. Once a license is selected, you can use the Launch command to open the licensed product.
You can click any of the column headings within the grid to sort the grid by that column. The sorted column will display an up or down arrow depending on the sort sequence: Ascending or Descending.
Use the View command to open a selected license and view the licensed features. Any features that were licensed but have expired will have an expiration date in red. If the entire license has expired, all features that were licensed have an expiration date in red and the Licensing tab in the view is also red.
Use the Find command to locate and add licenses to your Active License Directory. DataTools Launcher automatically logs into the Pervasive license web service at start up to download any new licenses, so the find command is for locating licenses you have on your local disk or network and copying these license files to your Active License Directory.
Adding licenses to the Active License Directory copies the license to the Active License Directory, leaving the original license in place. To search for licenses, type the path (including drive) you wish to search and press the Search button. You may cancel your search at any time if you already see the license you want to add in the found license list.
Select the licenses you wish to add, and click the Add button. As you add licenses the grid on the Active tab is updated to reflect these additions. Close the dialog when you are finished searching for and adding licenses.
Use the Purchase command to access the Pervasive e-commerce site and purchase additional licenses or extend a license that is about to expire. After purchasing additional software, you’ll need to click the Refresh button to download your newly purchased or extended licenses.
See also: Refresh
Before the e-commerce site is launched, you will see the following message:
Note: No additional software installations are required after purchasing more licenses unless additional features have been added to the Pervasive DataTools product family since your last installation.
To extend a license, you simply go to the e-commerce site and purchase more time on a product you already own. You will be sent a new license file with the extended expiration date. You will need to download this new license into your Active License Directory. If you have 2 or more licenses for the same product in the Active License Directory, the one with the farthest out expiration date will be displayed and all others will be moved to the Inactive directory.
Use the Refresh command to access the Pervasive license web service to check for new or missing license files. Your stored login userid and password will be used to see if new or missing license files exist. If any exist, you will be notified how many and be given the opportunity to add them to your Active License Directory.
If you click No, DataTools Launcher will load and display all non-expired .slc files in the Active License Directory. If you click Yes, you will be given the opportunity to add any of the new or missing files to your Active License Directory.
Check the box in the Add cell next to the license you would like to copy to your Active License Directory. After selecting the licenses you wish to add to your Active License Directory, press the OK button. The licenses will be added and the Active tab will display.
Use the Launch command to launch the graphical user interface application associated with the selected license. DataTools Launcher understands which application to launch based on the product name and serial number of the selected license. Each launched product has its own set of documentation. Besides documentation, there are samples and tutorials for learning how to use Data Tool applications. You are able to run only the applications licensed within the file selected within DataTools Launcher. If you are unsure as to which license you are using while in one of the launched applications, select Help > About within the launched application.
Note: It is recommended that you launch and work within only one application at a time. The suite of DataTools products uses a single copy of a configuration file (dj800.ini) needed for Pervasive products and engines to run. The license file being used is a part of this configuration file. Launching multiple applications could cause licensing errors.
The Inactive tab displays a list of the Pervasive DataTools licenses that have expired or have been super-ceded. These licenses are in an archived state. DataTools Launcher archives expired and super-ceded licenses automatically at startup. Licenses in an archived state have an .archive file extension.
The display grid on this tab shows some identifying information about each license. Expired licenses have an expiration date in red. On this tab, you can choose to view license feature details. You can click any of the column headings within the grid to sort the grid by that column. The sorted column will display an up or down arrow depending on the sort sequence: ascending or descending.
Use the View command to open an archived license and view the licensed features. Any features that were licensed but have expired will have an expiration date in red. If the entire license has expired, all features that were licensed have an expiration date in red and the Licensing tab in the view is also red.
To uninstall Pervasive DataTools, open the Windows Control Panel. Go to Add/Remove Programs and double click on Pervasive DataTools to remove it.
Note: If you plan to upgrade your version of Pervasive DataTools, you must uninstall it first, then install the new version.
12365 Riata Trace Parkway
Building B
Austin, TX 78727 USA
Web: http://pervasivedatatools.com
This manual is currently a work in progress and therefore is incomplete. Documentation for the Data Parser for Unstructured Text is also available by clicking on the Help button on the right end of the button bar whenever the product is running.
This manual leads you through the operation of the Data Parser for Unstructured Text user interface. The Pervasive Data Parser for Unstructured Text allows you to extract useful data from report files and convert that data to a CSV Text file. You must have a non-expired Data Parser for Unstructured Text license to run this application.
Refer to the license.txt file in the default installation directory for disclaimers and information about trademarks and credits.
The Data Parser for Unstructured Text is a software product with the ability to read complex text files of many kinds. The amount of computer data grows vastly each year, and much of it is provided in raw text formats. Some examples of the many sources handled by the Data Parser for Unstructured Text follow:
Using Data Parser for Unstructured Text, you can extract the desired data fields from various lines in the text file, and assemble those fields into a flat record of data. Thus, whole records of structured data can be extracted and presented in a conventional tabular (row and column) format that is needed before mapping and converting the data to a popular target format. Some of the features that make the Data Parser for Unstructured Text so complete are:
The extraction of desired fields from the source text file is accomplished by visually marking up the file in the Data Parser for Unstructured Text user interface. The mouse is employed to select the desired fields from various lines displayed on the screen. Dialog boxes on the screen allow you to express a rich set of pattern recognition rules and actions to assist in the extraction of clean data.
Several techniques are available to view samples of extracted data. Apart from scrolling the full text of the data, a debug window can be used to search for all lines satisfying certain extraction criteria. For details, see **Debug Extract Design Window**. In addition, users can pop up a data browser that assembles all the fields and records in a grid format to give the user an idea of how the data will export. For details, see **Record Browser Window**.
The Data Extractor is a tool for extracting data that would otherwise be inaccessible.
Consider these scenarios:
In each of these scenarios, the Data Extractor can extract valuable data from standard formated text files with lots of irrelevant information, such as headers and comments.
The Data Extractor exports the extracted data to CSV (Comma Separated Values) Text file format. If you want to convert the data to another format, or you want to manipulate the data further after you have extracted it, the Data Loaders can accomplish this. The Data Loaders support over 100 different file types, allowing you to convert your data to the vast majority of databases used throughout the world.
First you need to have a report file. Most applications on nearly every type of platform give you the option of creating and printing reports. Have the program print the report in a text only format, either ASCII or any standard EBCDIC code page. For more information, see How to Create a Report File.
Additional details about each of these steps are described in this documentation.
The following list presents some of the features available in the Data Extractor:
Some additional automatic menu options:
Seven step-by-step tutorials are available to help you learn how to use the Data Extractor. We recommend that you complete the tutorials in the order in which they appear, as each tutorial builds upon the concepts covered in the previous tutorial.
The Data Extractor tutorials all have several tasks in common. Those tasks are described here, and you may refer to them as needed.
Before starting each tutorial, you must select the matching tutorial file and set some basic options and file properties.
Use the following Data Extractor tutorial files with the tutorials:
Browse the data when you want to determine how your design choices have affected the data.
If the fields in the Data Browser window are not in the order you want them to appear in the export data file, change the export field order.
After you have completed your Extractor script, save and close it for later use.
The following is a list and brief description of each of the Data Extractor tutorials.
Tutorial 1 guides you through the basic steps to create and save a script file in Data Parser for Unstructured Text. Later tutorials are more detailed.
This tutorial presents the fundamental concepts for using the Data Parser for Unstructured Text. It is recommended that you do this tutorial first. The example file is a tagged list, but the procedure is useful regardless of the type of report. The best way to use this tutorial is to print a hard copy so you can follow the sequential steps.
In this tutorial, you will learn:
This tutorial is divided into three sections that should be completed in the order shown.
After selecting the tutorial file and setting up basic options, the first step in defining many extract scripts is to determine the line of data that marks the end of a record. In this case, the line with the string "Category:" is the last line of the first record.
After you identify the end of the record, define the line style for that line by marking the information that makes that line unique. In every record in this data file, the last line contains the string "Category:".
In the TUTOR1 file, the first line of text that contains pertinent data is the line with the report date "13-Jul-95" (10th line). The dashes (and their positions) in this line make it unique and are likely to remain consistent even if the date changes in later reports.
After defining line styles for 14 lines of the first record, define the Data Fields. You have given Data Parser the pattern information it needs to identify the lines in the report, now define what part of each line you consider to be useful data.
| Data Field | Starting Column | Ending Column |
|---|---|---|
| Report_Date | 11 | 19 |
| Problem_No | 25 | 30 |
| Techie | 25 | 52 |
| Status | 25 | 52 |
| MMDDYY | 25 | 32 |
| Time | 25 | 32 |
| Serial_No | 25 | 39 |
| Version | 25 | 52 |
| Customer_Name | 25 | 52 |
| Company_Name | 25 | 52 |
| Phone_No | 25 | 52 |
| Source_Type | 25 | 52 |
| Target_Type | 25 | 52 |
| Category | 25 | 52 |
Tutorial 2 guides you through the steps to create and save a script file using Data Parser’s automatic processes. The source file for this tutorial is the same tagged-list used in Data Parser for Unstructured Text Tutorial 1.
This tutorial introduces some of the useful timesaving features of Data Parser that read and flatten a data file that contains tagged data. It is useful to anyone ready to learn about more advanced Data Parser features. Tutorial 2 examines some quicker, more automatic ways to parse the same tagged-data used in Data Parser Tutorial 1.
In this tutorial, you will learn:
These steps should be completed in the order shown.
After selecting the tutorial file and setting up basic options, the first step in defining most extract scripts is to determine the line of data that marks the end of a record. In the TUTOR1 data file, the line of text that contains "Category:" marks the end of each record.
Since the Category Line Style is the last line of the record, the Line Action should be Accept Record. When Data Parser creates a line style automatically, it makes the line style Collect Fields, so the line action needs to be changed.
For this exercise, assume that the first line of the report contains information you want.
The purpose of this exercise is to update the automatically generated "Jul95" Line Style to make it more generic for different report dates.
In this exercise, you will Define Data Fields and Line Styles for the Techie, Status, MM/DD/YY, Time, Ser #, Version, Customer Name, Company Name, Phone #, Source Type, and Target Type Tagged Data Fields.
Note:Data Parser named the MM/DD/YY, Ser #, and Phone # Data Fields and corresponding Line Styles MMDDYY, Ser, and Phone. Also, Data Fields with embedded spaces are named with the spaces removed. This was done because Field Names can only contain letters, digits and underscores. Scroll down in the Data panel and see how the rest of the data is being defined.
Tip: This file can be parsed even more automatically. If you wish to try it, follow these steps:
Tutorial 3 guides you through the steps to create and save a script file in Data Parser for Unstructured Text that reads and flattens a report containing columnar data.
In Tutorial 3, you convert the data in a columnar report file, to a flattened format, using the more automatic features of Data Parser.
This tutorial introduces more of the time-saving features of Extract Schema Designer. Since a great many report formats contain columnar data of some kind, it is highly useful to anyone who wants to use Data Parser.
By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.
Unlike the previous tutorials, this file has multiple Accept Record line styles in a single page of the report. The primary data record information is in the table detail lines. Each line is essentially a record. Each of these is an Accept Record line.
In this tutorial, you will learn:
The following steps should be completed in the order shown:
After selecting the tutorial file and setting up basic options, define line styles and data fields. Data Parser does the following when you complete this task:
The Browse Data Record uses the Data Field names as column headings for the Data Fields, so it is a good idea to change the Data Field names for SALES/MARKETING_1 through SALES/MARKETING_7 to more descriptive field names.
| Default Name | Suggested Name |
|---|---|
| SALES/MARKETING_1 | Department |
| SALES/MARKETING_2 | Team1 |
| SALES/MARKETING_3 | Team2 |
| SALES/MARKETING_4 | Team3 |
| SALES/MARKETING_5 | Team4 |
| SALES/MARKETING_6 | Team5 |
| SALES/MARKETING_7 | DepartmentTotal |
To view the new Line Style SALESMARKETING, double-click on the name SALESMARKETING in the Line Style column (the yellow column on the left of your screen). The Line Style Definition window appears.
Notice the SALESMARKETING Line Style is recognized by a pattern where columns 1 to 16 contain the string SALES/MARKETING.
The Detail Line Style only matches lines that have SALES/MARKETING in columns 2 through 16. That is the recognition pattern that Data Parser created automatically, but it is not the pattern that is needed in this case. The pattern needs to be general enough to match all of the detail lines in the text, but specific enough to match ONLY the detail lines. Update the Line Pattern so that the Line Style match all of the detail lines excluding the TEAM TOTALS line.
Analyze the detail lines to find what makes them unique in comparison to other lines in the text. Things to look for are position of the Data Fields, contents of the Data Fields, anything that is consistent for each of the detail lines but not contained in non-detail lines. For example, the detail lines contain:
Of all of the above observations, creating a pattern to look for uppercase letters in the first five positions is the best way to go. Here are some reasons why:
In this part of the exercise, you will define the rest of the data in the record, starting with the report title.
Tutorial 4 guides you through the steps to create and save a script in Data Parser for Unstructured Text that reads and flattens a data file that containing floating tag data in a variable-length ASCII report.
This tutorial is useful to anyone likely to be working with floating tag data. By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.
In this tutorial, you will learn:
The steps in this tutorial should be completed in the order shown.
After selecting the tutorial file and setting up basic options, find the patterns in this file and build recognition patterns (Line Style Definitions) so that Data Parser can identify the lines with data.
The first characteristic of this report to notice is that each data record uses two lines of text. Another important characteristic is that several characters on each line are repeated consistently in the same position. These consistent patterns make it easy for you to build Line Style Definitions.
Data Parser automatically creates a Line Style using ATTDOC in columns 19 through 24 as the Recognition Pattern and ATTDOC as the Line Style Name when you complete this task. Each line of text that matches this Line Style now displays the Line Style Name ATTDOC and a green arrow in the Line Style Column to the left of the text line.
Notice that only the first couple of Data Fields in each of the TRN lines falls within the same columns from record to record. Define these fields first:
Because the patient and doctor names are different lengths, you cannot use Fixed Position to define the remainder of the Data Fields on the TRN lines. But because all of the fields other than the names have field labels with colons and spaces, Field Tags, you can define those fields as Floating Tag. "Floating" means that the Field Tags are not in the same position on the line in every record. If there were no Field Tags, you could still define the fields using Relative Word Position.
The fourth field starts in the same column in each of the TRN lines so you can define Start Rule as Fixed Position for this field.
The RATE field at the end of the TRN line starts with a Floating Tag, but ends at the end of the line of text. Define this Data Field accordingly.
Look at the ATTDOC line of text in the records. Notice that the Data Fields in this line are also Floating Tag data. Follow these steps to define all the Data Fields except the last field.
The BY field at the end of the ATTDOC line starts with a Floating Tag, but ends at the end of the line of text just like the RATE field in the first line. So, use the same steps as before, except use End of Line as the End Rule for that field.
Tutorial 5 guides you through the steps to create and save a script file in Data Parser for Unstructured Text that reads and flattens a data file containing both detail lines and a footer line with data to extract.
This tutorial is useful to anyone likely to be working with columnar data with footer. Before doing this tutorial, it is recommended that you do Data Parser Tutorial 3 - Columnar Data first.
By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.
In this tutorial, you will learn:
The steps in this tutorial should be completed in the order shown.
The primary data record information is in the table detail lines. This data is highly structured in neat consistent columns. Data Parser can build recognition patterns for Line Styles and Data Fields with this type of data automatically, saving you a lot of time and effort.
After selecting the tutorial file and setting up basic options, define line styles and data fields.Data Parser automatically creates a Line Style for the line and gives it a default Line Name of SALESMARKETING when you complete this task. Data Parser also automatically parses the line into 7 Data Fields using spaces as a column separator. The Data Fields are given default names of SALESMARKETING_1 through SALESMARKETING_7.
Since the Data Field names are used in the Browse Data Record as column headings for the Data Fields, change the Data Field names for SALES/MARKETING_1 through SALES/MARKETING_7 to more descriptive field names.
See the new, and more descriptive, names for the Data Fields in Table 3-4 below.
| Default Name | Suggested Name |
|---|---|
| SALES/MARKETING_1 | Department |
| SALES/MARKETING_2 | Team1 |
| SALES/MARKETING_3 | Team2 |
| SALES/MARKETING_4 | Team3 |
| SALES/MARKETING_5 | Team4 |
| SALES/MARKETING_6 | Team5 |
| SALES/MARKETING_7 | DepartmentTotal |
The SALESMARKETING Line Style is recognized by a pattern where columns 2 to 16 contain the text SALES/MARKETING. This pattern matches only the first detail line. It needs be general enough to match all of the detail lines in the text, but specific enough to match only the detail lines, not the TEAM TOTALS line.
Analyze the detail lines to find what makes them unique in comparison to other lines in the text. Things to look for are position of the Data Fields, contents of the Data Fields, anything that is consistent for each of the detail lines but not contained in non-detail lines. For example, the detail lines contain:
Of all of the above observations, creating a pattern to look for uppercase letters in the first 5 positions is the best way to go. Here are some reasons why:
So, the best pattern to use is one that looks for capital letters in columns 2 through 6.
Tutorial 6 guides you through the steps to create a script that reads and flattens a data file containing data that extends across multiple lines of text and where the end of each record varies.
This tutorial is useful to anyone who has a report with fields that extend across more than one line, or has no consistent end of record line.
By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.
In this tutorial, you will learn:
The steps in this tutorial should be completed in the order shown
Scroll through the data to get an idea of this file’s structure. Notice that there are eight or nine left-aligned Field Tags in each record. These tags can be used to easily identify and define the Line Styles in this report.
After selecting the tutorial file and setting up basic options, define your line styles.
| Field Tag | Beginning Column |
Ending Column |
|---|---|---|
| RECORDATION | 1 | 12 |
| CONSIDERATION | 1 | 14 |
| SITE DIMENSIONS | 1 | 16 |
| SITE AREA | 1 | 10 |
| ZONING | 1 | 7 |
| REMARKS | 1 | 8 |
Notice that some of the data you want to extract resides within a single line of text in one record but continues across multiple lines of text in other records. For example, the data in the CONSIDERATION field in the first record is on a single line of text, but in the second record, the data in the CONSIDERATION field continues across nine lines of text. This is easily defined using the Data Parser feature called Continuation Rule.
Tutorial 7 guides you through the steps to create and save an extract script file in Data Parser that uses multiple Accept Records.
You parse the data in a report file, TUTOR7 (supplied during installation), into a format suitable for exporting. By following the steps outlined below, you become familiar with both the process of creating an extract script and the terms used throughout the documentation.
In this tutorial, you will learn:
The steps in this tutorial should be completed in the order shown.
After selecting the tutorial file and setting up basic options, begin creating line styles for the first Accept Record.
Notice the entry under the Look For? column. Its default is in position 5. While this catches all pertinent lines in our example, it might not catch all instances in a larger record example.
Since this is the end of the first record (before other information becomes a subset of this information), make this line an Accept Record.
You must define a line style that recognizes each of the pet information lines. Analyze each of these records, to look for a common pattern to define the Line Style.
The Pet Type, Pet Name, Sex, and Color are all different in each record. The only thing that remains the same is the placement of Age, in positions 42 and 43.
When working with report or text files in the Data Extractor, there are three basic elements to help you accomplish your task. Those basic elements are line style, data field, and field content.
A line style is what you define that tells the Data Extractor how to identify a particular line of text in the report. You want to define each line style in such a way that the Data Extractor can identify that same line of text throughout the report. The trick to defining a good line style is to make the recognition rules specific enough to not include any lines you do not want recognized, yet broad enough to include all lines you do want recognized. See Defining Line Styles.
A data field is what you define that tells the Data Extractor which specific portions of the report you want to extract and assemble into data records. There are options in the Data Extractor that let you determine which data fields are collected and assembled as part of each output data record. For details, see Define Data Fields.
The field content is the data that occupies each defined data field. Any defined data field may contain data or may be blank in any given record. There are options in the Data Extractor that let you determine whether or not the field contents of one data field are carried forward to subsequent data fields that are blank. For details on the Flush Field Contents, Propagate Field Contents, and Flush Field Contents on Accept Default options, see Define Data Fields and Source Options Window.
When using Data Extractor to extract data from a text or report file, you follow some general procedures. It is important to understand that the goal is to define line styles and data fields in such a manner that the Data Extractor is able to assemble records of data out of the information contained in the report.
These sections offer some helpful hints and tips that may make your task easier. Since every report or text file is different, these subjects are more general in nature. There are more specific examples offered in other sections of the documentation.
Your goal is to extract useful data out of your report or text file and then assemble that data into a field-and-record-oriented format. Therefore, one of your initial steps should be to examine the report and find the logical record breaks. When you locate a logical record break, you define a line of text as the ACCEPT Record to mark where the Data Extractor should stop collecting data fields and assemble a data record.
Some types of reports are formatted in such a way that logical record breaks are easy to locate and the ACCEPT Record is easy to define. Some examples follow:
Other types of reports are not so easy. Some examples follow:
When your report is formatted in such a way that defining the end of a record is difficult, sometimes the only way to handle this is to define the beginning of a record. You can use the ACCEPT Record option that tells the Data Extractor to assemble a record, but collect the data fields Before Collecting this line’s fields. This ends the field collection action on the last defined line of text that falls BEFORE the ACCEPT Record line, and places any data fields on the ACCEPT Record line in the next record.
These are the basic steps required for creating an Extract script.
The following sections explain the details of these procedures.
Follow these steps to open a text file, report file, or URI in the Data Extractor.
When the file or URI is selected, the text displays in the Data Panel section of the Data Extractor main window.
All file types that the Data Extractor can open appear in the Files of Type box in the Select Report/Text File window. Because the Data Extractor supports so many different types, not all extensions are visible within the Files of Type box. However, all available file types appear in the selection window, including those files with extensions not viewable in the Files of Type box.
Note: A minimum of v1.4.0 of Sun Java Runtime must be installed for the URI support to work. Without this component, you will receive the error message: "Unable to load Java virtual machine."
Note: Extract scripts (.CXL files) derived from a URI source connection are now stored in the default workspace directory called Extracts. For more information, see URI Support.
Also, study the overall formatting of the information you want to extract, looking for any tagged or columnar data in the report. Pay special attention to the tag separators, field separators, and column separators. Again, in the Source Options Window, make appropriate selections from the list of available options.
You are now ready to begin defining line styles and data fields.
If you need to quit the Data Extractor and come back to your work at another time, save the extract. This also saves your work in a database file called extractor800.mdb in your \Common800 directory.
If you have already defined some line styles and data fields in a report and saved that work, you can open the extract and report in another session and continue your work or make modifications to it.
If you have created and saved a complete extract, you can open the saved Extract and run an export.
To open an existing extract and report, double-click it with the mouse in the Extract Manager, or select it then select File > Open Extract.
Note: Extract scripts (.CXL files) derived from a URI source connection are stored in the default workspace directory called Extracts.
You can open a previously-designed script with a different report or text file to check on script compatibility.
Because the Data Extractor reads and evaluates each data line, anytime you have many Line Styles (usually more than five), consider these tips to speed up the performance of your extract scripts.
Note: Do not alter the Line Style order when it is used to control which of multiple Line Styles could satisfy a line "hit". If you do, you might find your lines incorrectly marked.
Each line in your text or report file that contains information to be extracted must be defined with a line style. In addition, other lines may need to be defined as reference lines. Each line style consists of a recognition rule and a line action.
A line style definition is what the Data Extractor uses to identify particular lines of information within your report. For each set of information that you want to assemble into a record of data there may be one or more line styles. The number of line styles needed is determined by how many lines of text in the report contain that set of information. For example:
In most cases, it is best if you do not define any lines of text from which you are not extracting data. This keeps the script file smaller and more efficient. Lines for which there is no line style defined are ignored. The exception to this rule is that in some cases performance can be improved by defining certain large repetitive sections as Skip lines.
Note: If your source file contains tabs and you want to use the Auto Line Style feature, you must first update your tab expansion setting. To do this, select Source > Options from the menu. Then, select the Printer Emulation tab and change the Tab Expansion setting to 0.
The recognition rule portion of a line style definition contains criteria by which the Data Extractor identifies a line of text. In other words, you tell the Data Extractor how to recognize a line or lines of the report by defining a set of criteria. After you define a line style in one section of your report, the Data Extractor compares each line of text in your entire report file with that recognition rule. For each line of text that matches the recognition rule, the Data Extractor assigns that line style. The line style name displays in the Line Style column to the left of the Data Panel for each matching line of text in the report.
The trick to defining a good recognition rule is to make it specific enough to NOT include any lines you do NOT want recognized, and broad enough to include ALL the lines you DO want recognized.
You may define line styles manually or let the Data Extractor automatically define them, depending upon whether or not your data can be handled by the Data Extractor's automatic features. For details about this option, see File Menu and Pop-up Menus.
Note: You may want to utilize the more advanced features after becoming familiar with the basic procedures. Tutorial 1 will help you get acquainted with the basics of the Data Extractor, and Tutorials 2 and 3 will introduce you to some of the time saving advantages of the Advanced features.
If you are defining a line style manually, the Data Extractor suggests a recognition rule based on a pattern that displays in the Line Style Definition window when it opens. If you have highlighted a particular portion of the line, this portion is automatically suggested to create the recognition rule. You may modify the suggested recognition rule before adding it to the Data Extractor database and script. Details about different ways to define line styles are found in this documentation.
The manner in which you define line styles depends on a number of factors, including your own personal approach to a task. The other major factor is the type of text or report file with which you are working. The sections below should help you determine the best approach.
Remember the selections in the Source Options window should be examined and possibly modified prior to defining line styles. For details about the available options, see Source Options Window.
When you define a line style, you are specifying a recognition rule that the Data Extractor uses to identify any line of text in your file that matches that rule. Recognition rules are built in the Line Style Definition window. Each line style recognition rule is based on one of seven basic recognition styles. The available styles are described below.
Each style consists of an expression that specifies a search criterion. To see all the available options for building recognition rules, see Line Style Definition Window.
The following are some common, but brief, examples of where and how to use each of the seven basic Recognition styles:
If you see that there is a unique string of text on one type of line that does not appear on any other lines, but always appears on that type of line, highlight that text and it becomes the pattern that the Data Extractor uses to identify that line in each record. In some cases, the unique string of text may even be a single character in a consistent position.
This is the most common style to use for a recognition rule. It offers the most flexibility, but can also be the most difficult to define. Patterns are defined by either single-row or multiple-row expressions. Recognition patterns are described in more detail later in this documentation.
If the line of text you are defining appears in the same relative position to some other line, a Base Line, you might define the recognition rule by its relative position to that other line. The Relative Position option allows you to specify one or more lines above or below the Base Line. The line of text you want to use as a Base Line must be defined before you define a line style that refers to it.
If the line of text you are defining appears on only one line of text in the report, you might define the recognition rule by its Exact Line Number. This option is very rarely used because it can only be used to recognize lines that occur only once in a report.
If you need to define the lines of the report that do not contain any text, you can define the recognition rule for them by Blank Line.
It is not necessary to define the blank lines in your report except when you need to use them as Base Lines, Accept lines, or markers of some kind.
After you have defined all the necessary lines of text in your report, you may want to use this recognition rule to define all of the remaining undefined lines. Alternately, if all but a few of the lines of your report need to be recognized in the same way, you may want to define the lines you do not want, and then use All Undefined Lines to define the lines that you do want. Another reason you might want to define all undefined lines is for use in the Debug Extract Design Window.
The default line action for this option is COLLECT Fields, but you may select any Action that fits your needs.
Note: It is not necessary to define the undefined lines.
Select this option when you want the Data Extractor to define a line style based on its relative position to another line AND by some specific characters or types of characters. Before selecting this option you must have already defined another line to use as your Base Line.
The recognition rule options for Pattern and for Relative Position, as well as special options for using both, are described in detail in this documentation.
The default line action for this option is COLLECT Fields, but you may select any Action that fits your needs.
Select this option when you want the Data Extractor to use this line style to define all lines in the report that contain anything other than spaces and an end of line character.
There is no recognition rule for this option. Its behavior is automatic.
Once you have defined a line style and added it to the script, the line style name displays in the Line Style Column to the left of all lines that match that recognition rule.
If a line style name appears on any line that should not be included, select Edit Line Style and make the recognition rule more specific so the unwanted lines of text do not meet the recognition rule's criteria.
If a line style name does not appear on any line that should be included, select Edit Line Style and make the recognition rule broader so the needed lines of text match the recognition rule's criteria.
If the name of the line style you just defined does not appear on any line at all in the Line Style Column, then the recognition rule does not match any line in the report. Place the mouse in the Line Style Column of any blank line and double-click, or select Line > Edit Line Style. When the Line Style Definition window opens, select the line style name you want to modify from the Line Name list box, and edit the recognition rule so the expected lines of text meet the recognition rule's criteria.
If you need to modify the recognition rule of a line style, highlight any part of a line. Then select Line Style Column4 > Edit Line Style, or place the mouse pointer on a line style name in the Line Style Column and double-click. When the Line Style Definition window opens, modify the recognition rule and Update it.
The following sections describe how the Data Extractor builds recognition patterns.
When you select New Line Style from the menu, the Data Extractor creates a suggested line style recognition pattern that displays when the Line Style Definition window opens. For details, see Recognition Rules. If you highlighted a piece of the data on the line before making the selection, the Data Extractor uses the data you highlighted to form the recognition rule.
If the pattern meets your needs and the default line style name is acceptable, select a line action and then click Add to accept the recognition pattern. See Line Action. The Line Style Definition window closes unless you have turned Close Definition Dialogs on Add/Update OFF in the Preferences menu.
If you need to change the pattern or want to change the line style name, you may enter a new name or modify the pattern, then select a line action, and click Add. The Line Style Definition window closes unless you have turned the Close Definition Dialogs on Add/Update OFF in the Preferences menu.
When you select Auto New Line Style from the menu, another menu prompts you to select a line action. See Line Action. After you select a line action, the Data Extractor creates a line style recognition pattern automatically.
The default line style name displays in the Line Style Column for each line of text in your report file that matches the recognition pattern.
If you need to modify the line style recognition pattern or the line style name, double-click the line style name in the Line Style Column to open the Line Style Definition window and edit the line style recognition pattern and/or name. After making the desired modifications, click Update. The Line Style Definition window closes unless you have turned Close Definition Dialogs on Add/Update OFF in the Preferences menu.
After opening the report in the Data Extractor, scroll through your report file and choose a section of the file with which to work. Identify a section that is most representative of the entire file. Then open the Source Options window to specify the sample of the file with which you want to work. For details about how to set the sample size, see Source Options Window. You can move from one section of the report to another as you go along, if your report does not have any one section with all of the data you need to define.
Examine your text or report file before starting to define line styles to determine if some of the default options need to be changed in the Source Options window. For details about the available options, see Source Options Window. After making the necessary selections in the Source Options window, follow the steps below.
In most cases, it is best if you do not define any lines of text from which you are not extracting data.
Tip: If you know that the pattern you have highlighted will work for this line style, select Auto New Line Style from the second level pop-up menu rather than simply New Line Style. This will cause a third level pop-up menu to appear with a list of line actions. Choose the line action you need for this type of line. The line style will be created without even opening the Line Style Definition window. You will see it appear in the Line Style Column next to every line with the highlighted pattern. For more information, see Understanding Line Style Behavior below.
Each of the following sections describes methods you may use to highlight various types of data in your text or report file, and how the Data Extractor creates line style recognition rules based on what you highlighted.
When the Data Extractor searches a line of text to build a recognition rule, by default, it is based on a pattern. The Data Extractor searches for the following in the order listed here.
These are mutually exclusive, meaning if not field tags, then special characters; if not special characters, then first field, etc. The Data Extractor does not use field tags AND special characters AND first field.
If you use the mouse to highlight one or more characters on a line of text, and select New Line Style or Auto New Line Style from the menu, the Data Extractor uses the highlighted text as the line style recognition pattern. This is usually a good way to begin defining a line style. Additions or modifications can be made in the Line Style Definition window. See Line Style Definition Window.
Highlight some selected text in the Data Panel, anything less than the total line. Zero in on the exact text, even down to one character, if necessary, that you want the Data Extractor to use as the recognition pattern. Select Define Line Style > Auto New Line Style, and then select a line action.
The Data Extractor uses the exact highlighted text as the recognition pattern for that line style and then assigns a unique line style name.
With the Append Line Pattern option at the second level pop-up menu, you can easily add to the recognition pattern. Simply highlight another piece of text that you want added to the recognition rules for that line and select Append Line Pattern from the menu. For details about Append Line Pattern, see Pop-up Menus.
Highlight an entire line of text by clicking in the yellow Line Style Column to the left of the line. Select Define Line Style > Auto New Line Style, and then select a line action. The Data Extractor searches the entire line for:
Many reports contain field tags, and the Data Extractor was developed to make use of these tags. Field tags may be used as line style recognition patterns or as field names for data fields.
Field tags are usually descriptive words that identify the data that follows the tag. In report files, there is usually a character that separates the field tag from its data, such as a colon or a hyphen. You may specify the tag separator on a line-by-line basis or for the whole report file in the Source Options Window.
A valid field tag must contain three words or less to the left of the tag separator. The Data Extractor finds field tags by searching everything to the left of the specified tag separator until it finds two spaces or the left margin of the report. When identifying field tags, the Data Extractor always uses a space to distinguish one word from the next.
Some examples of how the word "Name" may be used as a field tag:
Some examples of multi-word field tags with a colon and a space as the separator:
For the purpose of extracting data out of text or report files, there are different types of printable information. For easier identification here, they are categorized into three groups, as follows:
First field is the first section of data on the line of text before the first fields separator (a space by default, but this can be reset in the Source Options window). More detailed information about how the Data Extractor automatically defines line styles can be found in the "How the Data Extractor Builds Recognition Patterns" section below. An explanation of each of the above search methods is described here.
Highlighting a block of text in the Data Panel that includes more than one line of text is typically done with columnar reports, for detail lines within a report, or for lists of tagged data as in the TUTOR1.REP. Examples of each can be found in the Data Extractor tutorials.
As you define the line styles in your report, you may find there are some lines of text that require special handling. This section examples those special kinds of lines and how you might define them with the Data Extractor. These include the following:
The Data Extractor automatically and invisibly, defines all blank lines in a text or report file and sets the line action to REJECT Line. No line style name displays in the Line Style Column. This keeps the Line Style Column less cluttered and enhances the Data Extractor performance. By identifying and rejecting blank lines, the Data Extractor reads the lines that contain information more quickly.
If you need to use a blank line as a reference line or Accept line, define it appropriately.
When header lines are present in your report file, one or more of those lines might contain information you want to extract and assemble as part of a data record. For example, there may be a report title, report date, or account number that you want as a data field in each record in your target file. Treat these lines as you would any other ordinary line that contains data that you need in your output file. First, define a line style for any header line that contains data you want to extract. The line action is usually going to be COLLECT Fields. Then, in each of those lines, define each data field you want to extract.
When footer lines are present in your report file, one or more of those lines might contain information that you want to extract and assemble as part of a data record. The Data Extractor can extract data out of footer lines that follow columnar data.
Define the line styles of any previous lines in the report. (Where you would normally select ACCEPT Record as the line action for the detail lines, select COLLECT Fields as the line action.) When you define the data fields within the detail lines, go to the Data Collection/Output tab in the Field Definition Window and turn Array Field ON for each field.
Proceed to the footer section and define the line or lines that contain the desired data. Select ACCEPT Record as the line action for the last defined line style. When you define the data fields on the footer lines, do not turn Array Field ON for these fields.
When you are defining line styles, the Data Extractor assigns line style names as follows:
You may rename any line style in the Line Style Definition window, but it must meet requirements. Line style names must be unique and are limited to 20 characters. They may include upper and/or lower case letters (A - Z, a - z), numbers (0 - 9), and underscores ( _ ), but may not begin with a number. Line style names may not include spaces.
Line style names do not appear in the output file in any way. They exist in the Data Extractor only to help you distinguish between the different lines for which you have defined recognition rules and line actions.
The second important part of a line style is the line action.
When defining a line style, you must select an Action that determines how the Data Extractor processes the lines of text or the data within the lines of text that matches the line style.
When you have made all the needed choices within the Line Style Definition Window, complete one of the following steps:
Each data field that you define in the Data Extractor holds a particular piece of information that you want to extract from your report or text file. Each "set" of data fields that you define in the Data Extractor is exported as a record. A combination of the Line Style Definitions and Data Field Definitions determine how the data is extracted and assembled into data records.
This topic provides information about the various ways to define data fields using the Data Extractor.
Select the Define Data Field option after you have highlighted some text in the Data Panel and want to define one or more data fields. A second level menu appears. The information in each section below describes the menu options that are available in the second level menu.
If the data formats and/or techniques in this topic do not seem to fit your specific needs, please refer to Data Fields - Advanced Options.
Select the Parse Columnar Data option if you want the Data Extractor to analyze the highlighted text, define the line style automatically, and parse the line of text based on the selected column separator. For details about Column Separator options, see Source Options Window.
For example, if you have a line of text that looks something like the following, you might want the Data Extractor to parse it for you. First, select # of Spaces + as the column separator. Then select Parse Columnar Data from the menu.
| 09/15/95 | Phone Bill | Debit | $120.45 |
The Data Extractor creates a line style that looks for the decimal point in the last column and the two slashes in the first column, and names it Phone Bill. It automatically parses this line into four data fields.
image\prscoldatfldex.gifThis option is only available if the highlighted line of text contains the column separator specified in Source Options between some text within the line. If there is no obvious pattern for the Data Extractor to automatically create a line style, a message appears letting you know that the Data Extractor cannot create the line style for you. You can then create a line style yourself, and use the Parse Columnar Data menu option to automatically parse the data fields on that line.
Select this option if you want the Data Extractor to analyze a block of highlighted text, define the line style automatically, parse the lines based on the selected column separator, and allow you to specify the location of column headings whose names become the names of the data fields. For details about column separator options, see Source Options Window.
An example of columnar data with a heading:
| Date | Description | TransType | Amount |
|---|---|---|---|
| 09/15/95 | Phone Bill | Debit | $120.45 |
After this option is selected a dialog box appears asking how many lines the heading is, and how many lines to skip. The number of lines of data is automatically calculated from there. Click OK when the numbers are correct. This option is useful, even if you have several places in the report with this same pattern of columnar data. Only one section of data needs to be highlighted. Then all of the data with this pattern in the report is automatically defined.
Select this option if you want the Data Extractor to define fields in the highlighted area based on the column/field separator character.
Examples:
If the field separator is set to Space (the default value), Mark Jones Jr. would export with "Mark" in the first field, "Jones" in the second field and "Jr." in the third field.
If the field separator is set to Pipe ( | ), 123A|San Francisco|Mark Jones Jr. would export with "123A" in the first field, "San Francisco" in the second field, and "Mark Jones Jr." in the third field.
Note: Because fields parsed on field separators are generally not fixed length or fixed position, they do not change color on the Data Panel when defined.
For details about field separator options, see Source Options Window.
Choose this option if you want the Data Extractor to analyze the highlighted text and parse it based on the selected tag separator. For details about field separator options, see Source Options Window.
For example, if you have selected some text that looks something like the following, you might want the Data Extractor to parse it for you:
Telephone: 512-555-1212
The Data Extractor defines the line style using the text "Telephone:" in column 1 through 10 as the recognition pattern, assigns a line style name of "Telephone", defines one data field that contains "512-555-1212", and assigns a field name of "Telephone".
image\taggedex.gifThe Parse Tagged Data option is only available if the highlighted text contains the selected tag separator.
Select this option if you want the Data Extractor to analyze the highlighted text and parse it based on the selected tag separator. For details about tag separator options, see Source Options Window.
HTML data is a platform independent text format commonly used on the Internet. Most people are familiar with its
Note: If Parse HTML Tagged Data is chosen with a selection over multiple lines, fields are parsed and defined only if the line was just added or there are no fields defined on the line. If Parse HTML Tagged Data is chosen with a selection in a single line, fields are always defined.
Select this option to have the Data Extractor automatically and invisibly define a data field based on text that has been highlighted in the Data Panel. The Field Definition window does not open when this option is selected. The field name defaults to LINESTYLENAME_1, LINESTYLENAME_2, etc.
This option is only available if you have highlighted some text prior to right-clicking.
Select this option if you want to define a data field manually or view the work that the Data Extractor has done, based on the text you highlighted. It opens the Field Definition window where you may make selections and enter information to name or define the data field.
This option is available if you have highlighted either an entire line or some selected text within a line.
Choose this option if you want to view or modify a data field. It opens the Field Definition window where you may make selections and enter information to more specifically define the selected data field.
This option is only available if the cursor is positioned in a data field that has already been defined.
Choose this option to delete the selected data field definition. You will be prompted to confirm the deletion. The Delete Data Field option is only available if the cursor is positioned in a data field that has already been defined.
If the data formats and/or techniques in this topic do not seem to fit your specific needs, please refer to Data Fields - Advanced Options.
The following sections describe how to define data fields when your report/text file contains more complex formatting than those described in the Defining Data Fields topic. To use the features described in this topic, select Define Data Field from the right-click menu in the Data Panel.
Since tagged data occurs frequently in report files, we have included some special ways to handle it in the Data Extractor.
Tagged data can be considered any information in a report or text file where there is a field tag, or field name, followed by the data on one line of text. There is usually some kind of separator between the field tag and the data, such as a colon or a dash. In the Data Extractor, there is a list of tag separators from which to choose in the Source Options Window.
This section explains fixed-position tagged data where the field tag and its data occupy the same consistent horizontal position in a line from record to record within the report. Fixed position tagged data may appear in your text or report file in one of two ways:
The procedures in this section are applicable to either of these two kinds of tagged data.
Note: It is not necessary to define the line style before using this procedure. The Data Extractor will use the field tag as a recognition pattern to create one automatically.
In the Data Panel, use the mouse to highlight a field tag, the field separator, and its data on a single line of text. Remember to extend the highlighted portion far enough to the right so wider data in subsequent records is picked up.
Example:
image\tagdata1.gifWith the mouse positioned anywhere in the Data Panel (the large white area), right-click. Select Define Data Field 4 Parse Tagged Data.
The Data Extractor first defines the line style, using the field tag as the recognition pattern and line style name. It then defines the data field by fixed position, starting with the column immediately after the selected tag separator and ending with the last highlighted column. The field tag is also used as the name for this data field.
Use the procedures above to define each piece of tagged data in one record. The matching lines and data fields in every other record of your report should automatically be defined. If they are not, you should edit the line style and/or the data field definition.
For a good example of how to work with tagged data using automatic features, see Data Extractor Tutorial 2 - Tagged Data and Automatic Features.
This section deals with various report data formats that can be described as tagged list data. A tagged list consists of two or more lines of text that contain tagged data in a fixed position with one field tag and data field per line.
Tagged data can be considered any information in a report or text file where there is a field tag or field name followed by the data on one line of text. There is usually some kind of separator between the field tag and the data, such as a colon, a dash, or spaces. In the Data Extractor, there is a list of tag separators from which to choose in the Source Options Window.
Tagged list data may appear in your text or report file in one of three basic ways:
Each is described below along with the procedure for defining it in the Data Extractor.
This type of tagged list contains field tags and data where the field tags are left justified and the data is right justified within the text file.
The first step in defining this type of tagged list is to go to the Script Design Choices tab of the Source Options Window and select the correct tag separator. For this type of format select # of Spaces+ from the Tag Separator list box. Another list box, the Separator Spaces box, displays below the Tag Separator box. Select With Data from this list box to indicate that the spaces between the tag and the data should be collected with the data. This causes the Data Extractor to extend the width of the data fields out to the left to within two spaces of the end of the longest field tag.
Here is an example of this type of a tagged list:
image\parstag3.gifNotice that an entire block of text was highlighted. This highlighted block of data consists of one record of data.
After the block of data is highlighted, right-click in the Data Panel and select Define Data Field > Parse Tagged Data from the menu.
The Data Extractor defines each line style using the field tag found on each line of text within the highlighted block of text. Each line style is given a unique line style name corresponding to the field tag. The Data Extractor also defines each data field on each line, and the field names also corresponds to the field tag.
This type of tagged list contains field tags and data where the field tags are right justified and the data is left justified within the text file.
The first step in defining this type of tagged list is to go to the Source Options Window and select the correct tag separator. For the example below, select ColonSpace(: ) from the Tag Separator list box.
Here is an example of this type of tagged list:
image\parstag2.gifNotice that an entire block of text is highlighted. This highlighted block of data consists of one record of data.
After the block of data is highlighted, right-click in the Data Panel and select Define Data Field > Parse Tagged Data.
The Data Extractor defines each line style using the field tag found on each line of text within the highlighted block of text. Each line style is given a unique line style name corresponding to the field tag. The Data Extractor also defines each data field on each line, and the field names will correspond to the field tag.
For a good example of this kind of data with instructions to parse it, see Data Extractor Tutorial 2 - Tagged Data and Automatic Features.
This type of tagged list contains field tags and data where both the field tags and the data are left justified within the text file.
The first step in defining this type of tagged list is to go to the Source Options Window and select the correct tag separator. For this type of format select # of Spaces + from the tag separator list box. Another list box, the Separator Spaces box, will display below the Tag Separator box. Select With Tag from this list box to indicate that the spaces between the tag and the data should be collected with the tag. This will cause the Data Extractor to extend the width of the field tags out to the right to within two spaces of the beginning of the data fields.
Here is an example of this type of tagged list:
image\parstag1.gifNotice that an entire block of text was highlighted. This highlighted block of data consists of one record of data.
After the block of data is highlighted, right click in the Data Panel and select Define Data Field 4 Parse Tagged Data.
The Data Extractor defines each line style using the field tag found on each line of text within the highlighted block of text. Each line style is given a unique line style name corresponding to the field tag. The Data Extractor also defines each data field on each line, and that the field names correspond to the field tag.
This section deals with tagged data where the field tag and its data occupy different horizontal positions in a line from one record to the next within the report. In other words the tag and data "float" within the line of text.
Tagged data can be considered any information in a report or text file where there is a field tag or field name followed by the data on one line of text. There is usually some kind of separator between the field tag and the data, such as a colon or a dash.
The first step in defining this type of tagged data is to go to the Source Options Window and select the correct tag separator. For the example below, select ColonSpace(: ) from the Tag Separator list box.
Floating tagged data usually appears in your text or report file with multiple fields per line. It is the variable length of the data in these fields that causes them to float.
image\TAGDATA2.gifNotice how the person’s name in the second record is longer than the name in the first record. This makes the end of that field a variable, not fixed, position. The tag for the third data field determines where the second data field ends. Also, the variable size of the second data field forces the third data field (Status) into a different position within the line. Thus, it has a variable beginning. The second data field on the line has a floating end tag, and the third data field has a floating start tag.
To define the AcctNo field:
You can use the Parse Tagged Data option from the right-click menu for the AcctNo: field in this report. But use the following procedure for the "Name:" and "Status:" fields.
To define the Name field:
It is common for fields to have both a floating tag start rule and end rule.
Note: When you use floating tags to define a data field, the text for those fields do not change to any color except black in the Data Panel. Floating Tag data fields also do not underline in the Data Panel even if you have that option turned ON. To verify the Data Extractor is reading the field contents correctly, open the Data Record Browser Window. To edit the field, highlight a piece of the line, right click and choose Define Data Field > Edit Data Field.
Since columnar data occurs frequently in report files, we have included special ways to handle various types of columnar data in the Data Extractor.
Columnar data can be considered any information in a report or text file where there are two or more columns of data. It looks like a spreadsheet without the grid lines. There is usually some kind of separator between the columns, such as one or more spaces or a tab. In the Data Extractor, there is a list of column separators in the Source Options Window from which to choose. A single column of data is discussed below in the "Other Types of Data" section.
Columnar data often has column headings above the actual data. The Data Extractor allows you to use the column headings automatically as field names for the data fields. See the two Highlight Columnar Data sections below.
In the Data Panel, use the mouse to highlight across two or more columns of data within a single line of text.
Example:
image\colsingl.gifWith the mouse positioned anywhere in the Data Panel, right-click. Select Define Data Field 4 Parse Columnar Data. One of two things happens:
Note: Columnar fields are automatically set to Flush Field Contents.
To highlight a specific block of the text, place the mouse pointer immediately to the left of the first character of data to be included in the highlighted selection. Click and drag to the right until the last character of data is included in the highlighted selection. All lines of text between the two corners are highlighted.
Example:
image\colwohdr.gifIn this example, please note that the upper left corner of the block starts immediately to the left of the word "SALES/MARKETING", so the blank space left of the "S" is not a part of the highlighted text. The lower right corner of the block ends immediately after an empty data field in the last column.
Procedure
Note: Columnar fields are automatically set to Flush Field Contents.
To highlight a specific block of the text, place the mouse pointer immediately to the left of the first character of data to be included in the highlighted selection. Click and drag to the right until the last character of data is included in the highlighted selection. All lines of text between the two corners are highlighted. image\colhsing.gif
Procedure
To highlight a specific block of the text, including headers, place the mouse pointer immediately to the left of the first character of a header to be included in the highlighted selection. Click and drag the down and to the right until the last character of data in a detail line is included in the highlighted selection. It is not necessary to highlight all of the detail lines that will be parsed, but you want to be sure to highlight far enough to the right so data in the last column is not truncated in lines you may not be able to see in the display area. All lines of text between the two corners are highlighted.
Example:
image\colwhdr.gifProcedure
For a good example of how to work with columnar data with the automatic features, see Data Extractor Tutorial 5 - Columnar Data with a Footer.
If the data in your report or text file is not formatted as tagged or columnar data, or if you need to extract data out of header and/or footer lines, read the information in this section.
For reports that contain data in a paragraph or other long multi-line format, it is necessary to define a continuation rule. This causes all the data to be gathered in a single data field.
The steps in this section are based on the assumption that you have already defined the line style for the line of text on which you now want to define data fields. For information on how to define line styles, see Defining Line Styles.
Define a line style that recognizes the first line of the data only. Highlight the data just on that one line.
image\tutor6example.gifProcedure
For details about each Continuation option, see Field Definition Window.
The field is not fixed length. The first line is fixed length and position and is colored and underlined. The rest of the field is not.
When header lines of text are present in your report file, some of them may contain information you want to extract and define as a data field in each record. For example, there may be a report title or report date that you want as a data field in each record in your target file.
You should first define a line style for any header line that contains data you want to extract. The COLLECT Fields line action is usually what is needed. Then in each of those lines, define each data field you want to extract. Continue defining the rest of the lines and fields.
You might also want to reposition the data fields prior to exporting the data. Open the Export Field Order Window by selecting Field > Export Field Layout and moving the data fields to the desired position.
When footer lines of text are present in your report file, some of them may contain information you want to extract and define as a data field in each record.
If your data does not contain detail lines that would normally be ACCEPT Record lines, make the footer the ACCEPT Record line and proceed normally.
If the previous lines are detail lines, select COLLECT Fields as the line action rather than ACCEPT Record, as usual. For each data field within the detail lines, go to the Data Collection tab in the Field Definition Window and turn Array Field ON for each field.
Proceed to the footer section and define the line or lines that contain the desired data. Select ACCEPT Record as the line action for the last defined line style. When you define the data fields on the footer lines, do not turn Array Field ON for these fields. For an example of arrayed fields with a footer line, see Data Extractor Tutorial 5 - Columnar Data with a Footer.
You may want to reposition the data fields prior to exporting the data. Open the Export Field Order Window and move the data fields to the desired position.
The Internal Data Browser allows you to view the extracted data in a tabular (row and column) format. This helps you know if your extraction rules are working as expected, and gives you an idea of how the data will look when exported to a CSV text file.
image\srcbrzic.gifTo open the Internal Data Browser, click the Browse Data Record button in the Tool Bar, or select Browse Records from the Tools menu. You must have done at least ONE of EACH of the following prior to opening the Record Browser window:
When the Internal Data Browser window opens it will look something like this:
image\browse3mixed.gifThis window shows a preview of the results of the data extraction rules you have defined in your script. It is not an actual file yet. You must run the export in order to create an output file.
You may view the entire extracted contents of your source data file, not just the part that displays in the Data Panel, by using the vertical and horizontal scroll bars to the right and below the data display. The entire window can be moved, sized, or maximized.
Tip: We suggest you do not use PAGE UP, PAGE DOWN, CTRL+END, or CTRL+HOME to move around in the Internal Data Browser. Due to the potential size of the data, it would be best if you use the arrow keys on your keyboard, or click the arrows in the horizontal and vertical scroll bars, thus moving only one column or record at a time. Also note that scrolling down is generally faster than scrolling up in the Browser.
The name of your source report/text file, including drive and directory path, is displayed above the data display in the window's title bar.
The field names you assigned in the Field Definition window will display as column headings above each column of data. If you did not assign field names to your data fields, the column headings will default to LineStyleName_1, LineStyleName_2, etc. The columns can be sized by dragging the right margin of the column in the header box to the right or left.
Here are descriptions of some of the other features of the Internal Data Browser:
To the left of each record (row) is a record number. This is not part of the actual data, but is a method of identifying records. If the cursor is positioned within a record (row) in the browser, the record (row) number will display in this box at the bottom of the window.
This box allows you to see your extracted data in several different formats. Formatted Text, Masked Text, Hex values, and Hex/Text mix, which shows printable characters normally and hex values of non-printing characters. For more information on hexadecimal character representation, see Hex and Decimal Values.
To close the Internal Data Browser, click the X in the upper right hand corner.
If you have defined only one ACCEPT Record, each row in the Browser will contain one data record that the Data Extractor has extracted and assembled from your report or text file, based on the line style and data field definitions you have created. This is where you can examine the records to determine if you have defined the line styles and data fields correctly to achieve the results you desire. This is also where you can view the order in which the data fields will be exported to your output data file. If you would like to change the field order prior to exporting the data, close the browser and choose Export Field Layout from the Field menu.
If you have defined more than one ACCEPT Record, a multi-record type browser will appear. This browser will show the data differently, with the first column indicating which record type is displayed. The ACCEPT Record line style name will be used to label the various record layouts. The data field name and the contents of the fields will appear in subsequent columns.
After successfully defining the data extraction rules in the Data Extractor, the next step is to export the data to a file on your workstation or any network drive/directory to which you have access. This help topic describes the export procedures and options.
The Data Extractor offers two export file types, as follows:
The data export features are available in the Export window. To open the Export window click on the Run Extractor button in the button bar. The following steps and options are found on five tabs in this window.
On the Connect Info tab, select the output file format and the drive, directory, and filename of the output data file.
Note: The Run Export button at the bottom of this window will remain grayed out until you have entered and connected to the Target File/URI. However, you should (at a minimum) go to the Properties and Records tabs and make any needed selections prior to exporting the data.
On the Sample Extraction tab, you may specify a subset of the extracted data records to be exported. This is convenient if you need a random sample of the records, or in cases where you simply want to export a few records for testing. The default selection is All Records. To select one of the other options, click the radio button to the left of the desired option. The options are:
| A Range of Records | Select this option when you want to specify the starting and ending record numbers to export. Then enter the desired starting and ending record numbers in their respective boxes. |
| Every Nth Record | Select this option when you want to select a random sample of records to export. Then enter the desired value in the "N" box. |
On the Filter Extraction tab, you may specify a subset of the extracted data records to be exported, based on record-filtering criteria. This is convenient if you need to export only the data records that contain specific information in a data field. The options are:
| Source Field | From the picklist, select the source data field that contains the information on which you want to filter the data records. |
| Operator | From the picklist, select the operator for your filter. The options are: < (less than), <= (less than or equal to), = (equal to), <> (not equal to), >= (greater than or equal to), and > (greater than). |
| Value | In this box, enter the information on which you want to filter the data records. Remember, that the selected operator may determine exactly what you enter here. |
The options on the Properties tab are properties that you may need to specify for your output data file. After you have selected all the correct choices from the list of options below, click the Apply button.
Delimited ASCII data has special characters between fields and records. Another name for delimited ASCII files is CSV (comma separated variable) text.
When delimited ASCII is your export file format, the default field separators are commas (,) and the field delimiters are quotation marks ("). The default record separator is a carriage return-line feed (CR-LF). If different separators and/or delimiters are needed, you can specify them by selecting from the options in the Properties tab and choosing the desired separators and delimiters for your target file. When you are creating a delimited ASCII file that will be imported into another application, you should set each of the Properties options as required by application before exporting the data.
The selected CodePage Translation Table determines which code table to use when writing out the target data. The default is ANSI, which is the standard in the US. You may select any of the other code page options from the picklist.
A delimited ASCII file is presumed to have a carriage return-line feed (CR-LF) between records. To specify a different a record separator for your output data file, click in the RecordSeparator cell and click once. Then click the down arrow to the right of the box and click the desired record separator in the list box. The list box choices are: carriage return-line feed (default), line feed, carriage return, line feed-carriage return, form feed, empty line, ctrl-E and no record separator. If you have or need an alternate record separator other than one from the list box, you can type it here.
Delimited ASCII files are presumed to have a comma between each field. To specify a different field separator, click in the FieldSeparator Current Value box and click once. Then click the down arrow to the right of the box to display the list of options. The list box options are: comma (default), tab, space, carriage return-line feed, line feed, carriage return, line feed-carriage return, ctrl-R, a pipe (|) and no field separator. If you need an alternate field separator other than one from the list box, you can type it here.
Delimited ASCII files are presumed to have start-of-field and end-of-field delimiters. The default delimiter is a quotation mark ( " ) because it is the most common. However, some files do not contain field delimiters, so this option is available for you to choose. To create an output data file with no delimiters, click in the FieldStartDelimiter Current Value box and click once. Then click the down arrow to the right of the box and select None.
Delimited ASCII files are presumed to have beginning-of-field and end-of-field delimiters. The default delimiter is a quote ( " ) because it is the most common. However, some files do not contain field delimiters, so this option is available for you to choose, if needed. To create an output file with no delimiters, click in the FieldEndDelimiter Current Value box and click once. Then click the down arrow and select None.
This option uses the field names you created in your Data Field Definitions to automatically create a header record in your output CSV data file. To create a header record, click in the Header Current Value box, click once and then click True. The default setting is False.
Note: If you are appending data to an existing file (Output Mode is Append), leave the setting for Header as False.
This option determines whether the specified FieldStartDelimiter and FieldEndDelimiter is used for all fields, only fields containing a separator, or only text fields, as follows:
| All | Places the delimiters specified in FieldStartDelimiter and FieldEndDelimiter before and after every field. "All" is the default setting. For example: "Smith","12345","Houston" |
| Partial | Places the specified delimiters before and after fields only where necessary. A field that contains a character that is the same as the field separator would have the field delimiters placed around it. A common occurrence of this is where there is a "memo" field that contains quotes within the text of data. For example: "Customer responded with "No thank you" to my offer". |
| Text | Places delimiters before and after Text and Name fields (non-numeric fields). Numeric and Date fields have no FieldStartDelimiter or FieldEndDelimiter. For example: "Smith", 12345,"Houston", 11/13/04 |
| NonNumeric | Places delimiters before and after all non-numeric types, such as Date fields. An important difference between Non-Numeric and Text: Non-Numeric delimits Date fields, while Text does not delimit Date fields. |
For an ASCII Target file, leading blanks are stripped by default. If you want to leave the leading blanks, click in the StripLeadingBlanks Current Value box and click once. Then click the down arrow to the right of the box and click False.
For an ASCII Target file, trailing blanks are stripped by default. If you want to leave the trailing blanks, click in the StripTrailingBlanks Current Value box and click once. Then click the down arrow to the right of the box and click False.
Allows you to specify a character, or a set of characters, to be filtered out of the Source data. For any character in Transliterate In, the corresponding character from the Transliterate Out property is substituted. If there is no corresponding character, the Source character is filtered out completely. Transliterate In supports C-style escape sequences such as: \n (new line), \r (carriage return) and \t (tab).
Allows you to specify a character to be substituted for another character from the Source data. For any character in Transliterate In, the corresponding character from the Transliterate Out property is substituted. If you wish the Source character to be filtered out completely, leave this field blank. If there are no characters to be transliterated, this field should be left blank. Transliterate Out supports C-style escape sequences such as: \n (new line), \r (carriage return) and \t (tab).
This option allows you to specify the maximum number of characters to write to a field. If this value is set to 0 (the default), the number of characters that are written to a field are determined by the field length. If you set this value to any value OTHER THAN zero, data may be truncated.
This option allows you to specify whether "blank" target data fields will contain an empty string (nothing) or a Null character (00). The default for most CSV text files is None.
On the Record tab, the picklist contains the name(s) of one or more record types, depending on the number of ACCEPT records you defined when defining the data extraction rules, as follows:
Tip: If your source report/text file contains parent and child records for which you need to retain the relationships, you should include a key field in each of the record types (ACCEPT record line styles) when specifying the fields to include in each. Using this methodology, your task of joining data across multiple files is possible.
At the bottom of the Export window, click on the Run Export button after making all the desired selections in the tabs. Your output data file will be written to the drive, directory path, and filename you specified on the Connect Info tab.
After you have defined your data extraction rules (line styles and data fields definitions), save your extract script for reuse.
Before saving an extract script, notice that new extract scripts show as "Extract: Extract1" on the title bar of the main window.
image\Savescr.gifYou might want to save your script for a variety of reasons, including:
To save an extract script, click Save Extract in the toolbar or select File > Save Extract in the menu. Both of the options display the Save Extract window.
Save your extract script in the same path as your report, and have it share the name of your report, with a CXL extension. This is consistent with the standard naming for Data Extractor scripts. The Data Extractor automatically saves your file with a .CXL extension.
The Author name is defaulted for you but can be changed.
Enter a description if desired and click OK. Your new extract script is saved.
Alternately, use the Save Extract As option to move an extract script file to a new location or save it as just a name, without a path.
You can open a previously-designed script with a different report or text file as follows:
Tip: If you modify the extract script to work with the new report/text file, you may want to use the Save As option to save the modified script with a new script filename, rather than overwriting the original script.
There is a toolbar located just below the main menu toward the top of the Extract Schema Designer window. The buttons in the toolbar provide quick access to the most commonly used commands and windows.
Each of the buttons on the toolbar is described below:
| Button Label | Description |
|---|---|
| New Extract | Selects a representative text file or HTTP address and create a new extract script file. |
| Open Extract | Opens a script file and report file that already exist. |
| Close Extract | Closes the current report and script file. If you have made changes to the open script, you will be prompted to save the changes. |
| Save Extract | Saves the extraction script on which you are working. If you are creating a new script, a dialog box will open in which you may enter or change the name of the script file, the author name, and a description. If you are modifying an existing script, clicking on this button will overwrite the script. To save another script with a different script name, select Save As from the File menu. |
| Validate | Verifies that at least one accept record is defined and all data fields are assigned to at least one accept record. If these conditions are met, a dialog box confirms "Validation Successful". Otherwise, a dialog box warns you of these errors in your script. |
| Source Options | Opens a dialog box that gives you some information about the current report. This is also where you can make selections that affect the way the Extract Schema Designer works with your report. See Source Options Window. |
| Source Font | Changes the screen font in the Extract Schema Designer window. |
| Underline Fields | Turns underlining of data fields ON or OFF. When underlining is turned ON, a broken line will display in the Data Panel under each defined fixed position data field. To turn Underline Fields OFF, click this button again. |
| Delete All Line Styles | Deletes all defined line styles in the current script. |
| Delete All Fields | Deletes all defined data fields in the current script. |
| Clear All Accept Record Fields | Clears All ACCEPT Records Fields. This allows you to completely scrap your entire export mapping and start over. When you choose this option, you will be asked if you want to Clear Fields in All ACCEPT Records. If you choose Yes, all of your fields become unassigned again (note the icon change in the Line Style column). If you choose No, you will return to the Extract Schema Designer window. For more details, see ACCEPT Record Definition Window. |
| Search Text | Searches for some specific text string in your report. A dialog box in which you enter the search string will open. See Find Text. |
| Debug Extract | Opens a dialog box that offers some methods of helping you determine how well your script is reading your text or report file. This allows you to verify that your line styles and data field definitions are either correct or need some additional work. For more details see Debug Extract Design Window. |
| External Source Data Viewer | Displays your report in another application, the application it was created in, the application that Windows associates with files of this type, or any other application of your choice. You can set a default application to use for viewing all of your files in the Source Options window. See Source Options Window. |
| Browse Data Records | Opens the browser after defining line styles and data fields. The Browser will display data from your report in a "flattened" column and row tabular format. This allows you to verify that your line style and data field definitions are either correct or need some additional work. For more details, see Record Browser Window. |
| Vertical Spacer | Shows or hides a red vertical positioning bar in the Data Panel in the main window. The positioning bar is helpful in determining whether or not particular text characters start and/or stop in the same position throughout the report. After turning the Positioning Bar ON, click the mouse to position the bar in the Data Panel. To turn the Positioning Bar OFF, click this button again. |
| Toggle Space/Tab Symbol | Shows or hides the space and tab symbols. Default symbols are a small gray dot to show where spaces exist in the source file and a double right pointing arrow for tabs. In Source > Options > Display Choices, you can choose different symbols if you wish. For details, see Source Options Window. Space and tab symbols can also be toggled on and off by selecting Preferences > Show Space Symbol from the menu. |
| Help | Accesses the documentation contents screen. |
The Extract Script Manager is the first window the user sees after initiating a new session of Data Parser for Unstructured Text. This is where saved scripts are listed. However, when you start the Data Parser for the first time, the list will be empty.
To create a new extraction script, click the New Extract button in the toolbar or select File > New Extract. For details, see Open a Text File, Report File or URI.
After one or more scripts have been created and saved, the following information about each appears: Extract Script Name, File Name, Author, Description and Last Updated.
The columns in the Extract Script Manager can be sized larger or smaller so the user can view the entire contents of each field. To size a column, place the mouse pointer directly over the vertical line between two of the column headings. The pointer will become a bold cross with arrows pointing left and right. Press and hold down the left mouse button and drag the pointer in either direction until you have sized the column to the desired width. Then release the mouse button.
The horizontal scroll bar just below the table allows the user to scroll left and right within Extract Script Manager to view additional columns of information. You may also use the mouse pointer to highlight the script name, and then move left or right using the left or right arrow keys.
If you have more Extracts than will display on one screen, you can use the vertical scroll bar to view additional scripts.
To highlight an extraction script (for viewing, modifying, or deleting), place the mouse pointer just to the left of the extraction script number. The pointer will change to a right-pointing arrow. Click once to highlight the entire line. The extraction script name appears in the Extract box centered just below the Extract Manager table.
To open a saved extraction script and load the original report, highlight the extraction script as described above and double-click the extract name or select File > Open Extract.
To open a saved extraction script and load a different report, open the extraction script as described above. Then select Source > Options. Choose the File Properties tab. Click the down arrow next to text file, browse to select the new file and click Open. Click OK.
To delete a saved extraction script, highlight the extraction script as described above, then press Delete.
The Pattern Builder window is where you specify the exact criteria for which the Data Extractor is searching when it is determining whether or not a line of text in the report should be defined as a particular line style.
The Pattern Builder Window is accessed from the Look For? cell in the Line Style Definition Window when you click the down arrow.
Available options in the Pattern Builder window are:
The menu in the upper left corner allows you to paste text from the white pane of the data display into the Value field. The paste function is only available when the Type is literal.
There are five types of criteria the Data Extractor looks for. The five options are literal, character class, negated character class, mask, and regular expression.
Select this option to have the Data Extractor search for some specific string of text. The string will be entered in the Value column.
Select this option to have the Data Extractor search for some classification of characters, digits, or other type of criteria. A character class may be any of the classes or types of information listed below in the Value section, or a user may enter any valid string to specify their own character class.
Select this option to have the Data Extractor search for anything EXCEPT some classification of characters, digits, or other type of criteria. A character class may be any of the classes or types of information listed below in the Value section, or a user may enter any valid string to specify their own character class.
Select this option if you need to enter a special expression to define the search criteria. There are three special symbols that you may use along with any printable character to build the mask. The special symbols are @, #, and *. Each is explained below.
| Symbol | Description |
|---|---|
| @ | Use the @ (at sign) when you want the Data Extractor to search for any alpha character. For each @ entered in the mask, the Data Extractor searches for [A - Za - z]. |
| # | Use the # (pound sign) when you want the Data Extractor to search for any digit. For each # entered in the mask, the Data Extractor searches for [0 - 9]. |
| * | Use the * (asterisk) when you want the Data Extractor to search for any alphanumeric character or digit. For each * entered in the mask, the Data Extractor searches for [A - Za - z0 - 9]. |
Any printable ASCII character (except the three special characters shown above) and spaces can be used as a literal in combination with the three special characters to build the mask.
Some mask examples:
To set up the criteria for any Canadian postal code, enter a mask of @#@ #@#. This mask looks for [A - Za - z][0 - 9] [A - Za - z] (space) [0 - 9] [A - Za - z][0 - 9]
To set up the criteria for any social security number, enter a mask of ###-##-####. This mask looks for [0 - 9] [0 - 9] [0 - 9] - [0 - 9] [0 - 9] - [0 - 9] [0 - 9] [0 - 9] [0 - 9]
Select this option if you want to use a language that allows you to specify a string of characters that defines a set of rules for matching character strings. A regular expression will match a field whether it matches the whole field or just a small sequence within the field.
The following are the special characters that can be used in Data Extractor regular expressions:
| (and) * + ? [and] - . \ ^ $
To use the literal value of a special character within a regular expression, you must precede the special character with a backslash ( \ ). For example, to enter a literal backslash, you must type it twice ( \\ ); to enter a literal dollar sign, you must type backslash and then dollar sign ( \$ ).
Example:
To set up the criteria for a four-column line position test, enter (0[0-9])(1[1-5]) OR 0[0-9]1[1-5]. This expression would find a line number with the first position of 0, the second position of 0-9, the third position of 1, and the fourth position of 1-5. Yielding up to 50 different hits.
If you selected literal in the Type column, the value entered here is the exact string for which you want the Data Extractor to search. When entering the value, type only the exact string. This is case sensitive. The Data Extractor automatically encloses the value in quotation marks. Also see "Set Value = Selected Report Text".
If you highlighted a section of the text before opening the Line Style Definition window, that value is generally entered here automatically.
When you select character class, negated character class or regular expression in the Type column you must either make a selection from the list box in the Value column or enter your own string. Remember, negated character class means anything EXCEPT what you specify in the Value column.
The options in the Value column list box follow:
| Option | Description |
|---|---|
| any character | Select this option to have the Data Extractor search for any printable or non-printable character. |
| digits | Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9). |
| digits/ | Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9) or a forward slash (/). |
| digits- | Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9) or a hyphen (-). |
| digits/- | Select this option to have the Data Extractor search for any of the digits zero (0) through nine (9) or a forward slash (/) or a hyphen (-). |
| letters | Select this option to have the Data Extractor search for any printable alphabetical character (a - z or A - Z). |
| upper case letters | Select this option to have the Data Extractor search for any printable upper case alphabetical character (A - Z). |
| lower case letters | Select this option to have the Data Extractor search for any printable lower case alphabetical character (a - z). |
| alphanumeric characters | Select this option to have the Data Extractor search for any printable alphabetical character (a - z or A - Z) or any digit from zero (0) through nine (9). |
| letters and white space | Select this option to have the Data Extractor search for any printable upper or lower case letter (A-Z or a-z) and white space. |
| upper case letters and white space | Select this option to have the Data Extractor search for any printable upper case letter (A-Z or a-z) and white space. |
| lower case letters and white space | Select this option to have the Data Extractor search for any printable lower case letter (A-Z or a-z) and white space. |
| white space | Select this option to have the Data Extractor search for a Tab (hex 09 or ANSI 009) or some designated number of spaces. |
| space | Select this option to have the Data Extractor search for a single space. |
| tab | Select this option to have the Data Extractor search for a tab character (hex 09 or ANSI 009). Tab expansion must be set to 0 on the Printer Emulation tab in the Source Options window in order for the Data Extractor to detect Tab characters in the file. |
| carriage return | Select this option to have the Data Extractor search for a carriage return (hex 0D or ANSI 013). |
The count determines how many of the specified string(s) the Data Extractor searches for.
It is important to note that Count searches for consecutive values. For example, if you search on the string MM/DD/YY using the following settings in the Pattern Builder, no matching results display:
There is no match because MM/DD/YY does not contain 3 consecutive alphanumeric characters.
Here are the Count options:
| Option | Description |
|---|---|
| 0 - many | Select this option if you want the Data Extractor to search for "None" to "Many" of the specified string(s). |
| 1 - many | Select this option if you want the Data Extractor to search for "One" to "Many" of the specified string(s). |
| 0 - 1 | Select this option if you want the Data Extractor to search for "None" to "One" of the specified string(s). |
| 1 thru 9 | Select any of these options when you want the Data Extractor to search for a specific number of the specified string(s). If you want the Data Extractor to search for more than 9 of a specified value, type the number desired in this cell. |
This is the column number in which the Data Extractor should start searching for the specified string. If you modify the string in the Value column, you may need to enter a new column number here. Values only appear in the Start cell when Column is selected in Search What?. For more information about the Begin option, see Begin (line or column).
This is the column number in which the Data Extractor should stop searching for the specific string. If you modify the string in the Value column, you may need to enter a new column number here. Values only appear in the End cell when Column is selected in Search What?. For more information about the End option, see End (line or column).
Extract Script Designer is the working area where you will use the mouse pointer, shortcut menus, and dialog boxes to define specific parts of the report text that you want to convert to another format.
The main window may be sized, minimized, or maximized to suit your needs.
A bold Vertical Splitter Bar splits this window into two panes. The left pane is the Line Style Column. The right pane is the Data Panel. Details about this window are discussed below.
This is the lightly tinted yellow pane to the left of the Vertical Splitter Bar. When you first open a new report file, the Line Style Column will be empty. As you define lines of text, the line style names will display in the Line Style Column. Along with the line style names, one or more symbols may also display in the Line Style Column to designate the line action and/or other information about a particular line in your report. See Line Action.
The main purpose of the Line Style Column is to display the line style name and line action symbols of each defined line of text. The line style names give you a visual way to verify that each line of text in your report file matches the appropriate line style recognition pattern. The line action icons give you a visual way to identify how the Data Parser for Unstructured Text is going to process the line of text and any data fields you have defined within that line.
The mouse behavior is slightly different in the Line Style Column as opposed to the Data Panel. If you click the mouse in the Line Style Column, the entire line will be selected. When you highlight some of the data and right click in the Line Style Column, a different shortcut menu will display from the one that would display if you highlighted some data and right clicked in the Data Panel. Details about the shortcut menus are described in Shortcut Menus. For details about how to define line styles, see Defining Line Styles.
After a line of text has been defined, you can double-click a line style name in the Line Style Column to open the Line Style Definition window for viewing and/or modifying the recognition rule. For details, see Line Style Definition Window.
There is a horizontal scroll bar at the bottom of the Line Style Column. If the width of the Line Style Column is less than the longest line style name, you may scroll to the right to view the portion of the line style name that does not display. You may also drag the Vertical Splitter Bar to the right to make the Line Style Column wider.
This is the large white pane to the right of the bold Vertical Splitter Bar. The Data Panel is the main work area where you highlight selected text that will be used to define line styles and/or data fields.
To indicate that you want to define a line style or data field, highlight a particular selection from a line of text and click the right mouse button to bring up a series of shortcut menus from which to work. For details, see Shortcut Menus.
As data fields are defined, the text that is included in the various fields will change from black to red, green, blue, or magenta if the fields are fixed length and fixed position. Where there is more than one data field on a line of text, the colors will alternate so you can distinguish one field from another on that line. The individual colors have no particular significance. You may also have the Data Parser for Unstructured Text display a colored, dashed line under each defined data field by turning Underline Fields ON in the Preferences menu. For details about other display options, see Source Options Window.
For details about defining lines of text, see Defining Line Styles.
Details about defining data fields are discussed in the Define Data Fields.
There is a horizontal scroll bar at the bottom of the Data Panel. If the width of the Data Panel is less than the longest line of text in your report file, you may scroll to the right to view the portion of the text that does not display. You may also adjust the width of the main Data Parser for Unstructured Text window by dragging its right border to the right, or by maximizing it.
There is a vertical scroll bar on the right side of the Data Panel when the report is longer than will display in the window. You may scroll up and down in the Data Panel to view the portions of the report that do not display. The Source Sample setting in Source Options will determine how much of your report file actually shows in the Data Panel. For details, see Source Options Window.
The bold vertical splitter bar between the Line Style Column and Data Panel may be dragged left and right to adjust the sizes of the two panes of the main window. Place the mouse pointer directly on the splitter bar. The pointer will change to a bold cross with left- and right-pointing arrows. Click and hold down the left mouse button and drag the splitter bar to the desired position. Then release the mouse button.
This bold black Vertical Splitter Bar between the two panes of the Script Designer window should not be confused with the thin red Vertical Positioning Bar that can be placed in the Data Panel by clicking the Vertical Positioning Bar button on the toolbar and then clicking in the Data Panel.
In the lower left of this window, below the Line Style Column, there are two boxes that indicate at which text line and column the blinking cursor is positioned. A text or report file must be opened in the Data Parser for Unstructured Text for this to be active.
When you move the mouse pointer over a defined data field (three exceptions discussed below) in the Data Panel, the mouse pointer will change to a hand. In the lower left of this window, below the Data Panel and Line Style Column, there is a Field Position Indicator that will display the name of the data field over which the mouse pointer is located.
There are three exceptions:
These fields are non-fixed length and non-fixed position, and therefore will not be colored on the Data Panel.
In the lower right of this window, below the Data Panel, there is a box that indicates at which text line and column the mouse is positioned. The number to the left of the comma indicates the text line. The number to the right of the comma indicates the column. A text or report file must be opened in the Data Parser for Unstructured Text for this to be active. And when you move the mouse pointer beyond the End of Line marker, the mouse position boxes will remain blank.
In the lower right corner of this window, below the Data Panel, there is a box that indicates the hex and decimal value for any character that the mouse pointer is positioned over. See for a table of values and their associated characters. This box will not display a value for end of line characters. See Decimal and Hexadecimal Values.
The Source Options window opens each time you begin a new extract script, unless you go to the Display Choices tab and clear the Display Source Options with New Extract check box. Otherwise, to open this window, click the Source Options button in the Tool Bar, or select Options from the Source menu. The purpose of the Source Options window is to allow you to change the way the Data Parser for Unstructured Text reads your text file.
If you are familiar with the text or report file with which you are working, the Source Options window can be opened and some selections made before opening the file. Other options can be changed to meet the requirements of the report file as you are parsing it.
If you make changes to the settings in this window after opening the report file, the Data Parser for Unstructured Text may reread and reload your file. This may take a few seconds.
The window is divided into seven tabs: Extract Design Choices, Display Choices, File Properties, Printer Emulation, Character Set, Character Filters and External Viewer. The options in each tab are discussed below:
This topic covers the settings under Extract Design Choices.
The tag separator selected here tells the Data Parser for Unstructured Text how to distinguish a field tag from the data field when analyzing a line of text. This is only relevant when you are making use of the Parse Tagged Data shortcut menu option.
The tag separator choices are:
| Tag Separator | Description |
|---|---|
| ColonSpace (: ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a Colon and a Space. Example: Name: John M. Smith |
| Colon (:) | This is the default setting. This option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a colon only. Example: Name: John M. Smith |
| SpaceColon ( :) | Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a Space and a Colon. Example: Name: John M. Smith |
| Dash (-) | Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a dash. Example: Name-John M. Smith |
| Comma (,) | Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a comma. Example: Name, John M. Smith |
| # of Spaces | Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a specified number of spaces. When you select this option another box appears to the right of the tag separator box in which you type the desired value to specify how many spaces. Example with 3 spaces specified: Name John M. Smith |
| # of Spaces + | Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by two or more spaces. Example: Name John M. Smith |
| Vertical Bar ( | ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a vertical bar or pipe ( | ). Example: Name|John M. Smith |
| Other | If your report file has a character that is not on the list of choices separating the field tags from the field data, you can highlight the character shown and type in any single printable character. Example: Name*John M. Smith |
For details on tagged report data, see Define Data Fields.
The Column Separator selected here tells the Data Parser for Unstructured Text how to distinguish one column of data from the next when analyzing a line or block of text. This setting is only relevant when you are using the Parse Columnar Data or Parse Columnar w/ Heading shortcut menu options. Each column within a line of text will become a data field.
The Column Separator choices are:
| Column Separator | Description |
|---|---|
| (2+) Spaces | This is the default setting. This option tells the Data Parser for Unstructured Text to distinguish between two columns of data by two or more spaces. Example: 10019 John M. Smith. The account number 10019 is one column, or data field, and the person's full name, John M. Smith, which contains single spaces, is another column, or data field. |
| (1) Space | Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a single space. |
| Tab | Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a tab. |
| Vertical Bar ( | ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a vertical bar or pipe ( | ). |
| Other | If your report file has a character that is not on the list of choices separating columns of data, you can highlight the character shown and type in any single printable character. Example: 10019 - John M. Smith |
The Data Parser for Unstructured Text outputs each record type as a fixed length, consistent structure record. This means that even if a field or line does not exist in a particular section of a report, that field will still exist in the output record. Propagate and Flush are opposite ways of handling these fields that contain no data or do not exist.
Propagate Field Contents is the normal default. It causes the Data Parser for Unstructured Text to carry the field contents forward from the last section that contained data in the data field when the report contains no data in the section being processed. This is the default and is particularly useful for header information that you want repeated until a new header is encountered.
Flush field contents is the opposite of propagate field contents. With the Flush Field Contents on Accept default check box checked, fields that do not exist or are blank in the report will be blank in the output by default. This only holds true for any fields defined after this setting is made. Previously defined fields will not have their Data Collection/Output options changed.
Individual fields can still be set to Propagate Field Contents on the Data Collection/Output tab of the Field Definition window.
This option is necessary when you have defined your ACCEPT Record line as Accept Before Collecting. This is usually done when there is no line that can be consistently identified as the end of a record, but there is a consistent beginning line. Since the Data Parser for Unstructured Text writes out a record as soon as it encounters the ACCEPT Record line, the very last record in a report with Accept Before Collecting will not be written out unless this option is checked. For more information, see Line Action.
The settings in the Source Option window for Do Accept at End are the default settings for future Accept records. Select the Do Accept at End Default checkbox (optional) the line styles will do an Accept at End. You can override the default settings, by selecting the Do Accept at End checkbox in the Line Action tab of the Line Style Definition window. Values can be set individually for each Accept record.
When your report does not contain a consistent last line, you may need to define the ACCEPT Record as the first line of a header or some other consistent line of text that appears at the beginning of each section of the report. This line would have the ACCEPT Record Before Collecting line action. When this line action is chosen, the resulting first record will contain no data. It will be blank. To avoid a blank first record in your output file, turn Skip First Accept ON by checking the box. See Line Action.
The setting in the Source Options window for Skip First Accept Default is the default setting for future Accept records. If you select the Skip First Accept Default checkbox, the Accept line styles you define will skip the first blank record in the output file. You can override the default settings by selecting the Skip First Accept checkbox in the Line Action tab of the Line Style Definition window. Values can be set individually for each Accept record.
When this option is checked, the Data Parser for Unstructured Text will strip your report fields of all leading and trailing spaces. This is especially useful for parsed columnar fields where you have a set field length, but have no real idea how large each piece of data will be. For details, see Shortcut Menu - Line Style Column.
This setting affects how the greater than, less than, and other comparison operators in the Line Style Definition window Operator column work. With the Numeric Comparison radio button selected, any input data will be checked to be certain it is numeric and then compared as a number. With the String Comparison radio button checked, the input data would be compared as a string, regardless of what sort of data it is.
The settings on this tab tell the Data Parser for Unstructured Text:
Each of the options is described below.
The purpose of these settings is to allow you to select a representative sub-sample of a large text file when defining the line styles and data fields. Using a sub-sample will greatly improve the speed at which you can work.
Remember that this is only for display purposes. After you have defined your line styles and data fields in the selected sub-sample, the Data Parser for Unstructured Text still extracts the data from the entire report file. Also, the Record Browser Window previews the extract from the entire file.
When selecting a subsample for Data Parser for Unstructured Text to read, be sure that it is representative of the most complete records in the file. The available options are:
By default the Data Parser for Unstructured Text starts displaying the text file at the beginning of the first line in the file (line 1). However, when selecting a representative subsample of the file, you may want the Data Parser for Unstructured Text to start reading the file on some other line.
You may highlight the default value (1) and type a new value, representing the line number from which you want Data Parser for Unstructured Text to start displaying data. Use the Cursor Position Box to help you determine the line number. For example, if you want the Data Parser for Unstructured Text to start displaying a report file at line 9, rather than line 1, simply highlight the 1, type in a 9.
By default the Data Parser for Unstructured Text displays the first 500 lines of the text file, beginning at line 1 and ending at line 500. The Data Parser for Unstructured Text can display many more lines of text. However, the more lines displayed in the data panel the more slowly the Data Parser for Unstructured Text will function. Also, when selecting a representative sub-sample of the file, you may want the Data Parser for Unstructured Text to display a smaller subset of lines in the file.
You may highlight the default value (500) and type a new value, representing the line number at which you want the Data Parser for Unstructured Text to stop reading. For example, if you want the Data Parser for Unstructured Text to stop reading a report file at line 35, rather than line 500, simply highlight the 500, and type 35.
Click OK in the Source Options window when you are finished making your selections.
As you change the settings in Starting Line or Ending Line, the Sample Size value will change to reflect the total number of lines of text that the Data Parser for Unstructured Text displays. Performance will be affected as the Sample Size increases.
This option allows you to choose whether the Data Parser for Unstructured Text pads each line of text in your report out to a fixed right margin, or reads each line only to the last character on that line. The default setting is OFF. You can turn it ON by clicking in the white square to the left of Pad Lines. This removes the check in the box.
When a report has a ragged right margin, this option will assist you in defining data fields to a width that does not cause the data in wider fields to be truncated. It can be misleading, however, if you are defining line styles. The spaces used to pad the lines in the display do not actually exist in the text file. Do not use them in recognition patterns for line styles.
When the Pad Lines option is turned ON, the Data Parser for Unstructured Text will pad each line of text in the selected sample out to the longest length necessary to accommodate the data. That value is displayed in Pad Length. For example, in TUTOR1.REP, the default Pad Line Length is 52.
If you want the Data Parser for Unstructured Text to pad with spaces past the last character in all the lines of text, you may change the value of Pad Line Length. Highlight the default value and type a larger value. Performance will be affected as the value increases.
This is a toggle that allows you to choose whether or not to display a symbol at the end of each line of text. The default setting is ON. You may turn it OFF by clicking on the box that contains the check mark.
If you want the Data Parser for Unstructured Text to display an end line symbol, this is a list of symbols from which to choose. The options are a paragraph mark (¶) or {EOL}. Select the desired choice from the list box.
This is a toggle that allows you to choose whether or not to display a symbol where each space character exists in the display. This includes both space characters that exist in the source text file and also any characters that have been added to the Data Panel by the Pad Lines option. The default setting is ON. You may turn it OFF by clicking on the box that contains the check mark.
If you want the Data Parser for Unstructured Text to display a space symbol, this is a list of symbols from which to choose. The options are a small dot ( × ) or a period ( . ). Select the desired choice from the list box.
This is a toggle that allows you to choose whether or not to display a symbol where a tab character exists in the source report file. The default setting is ON. You may turn it OFF by clicking on the box that contains the check mark. The symbol will only appear if tab expansion is set to 0 on the Printer Emulation tab of the Source Options window.
If you want the Data Parser for Unstructured Text to display a tab symbol, this is a list of symbols from which to choose. The options are a small double right angle bracket ( ) or a single right angle bracket ( > ). Select the desired choice from the list box.
This is a toggle that allows you to choose whether or not to display horizontal lines under each line of data in the Data Panel. The default setting is OFF. You may turn it ON by clicking on the box that contains the check mark.
This is a toggle that allows you to choose whether or not to display vertical lines in between each line of data in the Data Panel. The default setting is OFF. You may turn it ON by clicking on the box that contains the check mark.
This tells the Data Parser for Unstructured Text to automatically show the Source Options dialog box each time a new extract script is created. The default is ON. If you do not want the Source Options dialog box to open when you create a new extract script, turn it OFF click the box containing the check mark.
The options listed under File Properties are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.
These selections should be made prior to opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.
The exception is the Text File choice. This will always open with the text file that was used originally to define the script. If you want to change it, you must open the script first.
This is the report file that you are using, or URI to which you are connecting. If you want to check or run the script that you have designed against a different report file, you can click the down arrow and browse to the new report file. The Data Parser for Unstructured Text will load the new file and apply the script that is presently open to that file.
A text file is presumed to have a carriage return-line feed (CR-LF) at the end of each line. However, some files have different characters at the end of the line.
To specify some other line separator, or to specify the number of bytes of each line of a fixed length report, place the mouse pointer in the Line Separator box and click once. Then click the down arrow to the right of the box and click the desired Line Separator in the list. The list box choices are: carriage return-line feed (default), line feed, carriage return, line feed-carriage return, form feed, empty line, and number of bytes.
If you select number of bytes, enter the length of each data record in the Byte Count box that appears just below the Line Separator box. The Data Parser for Unstructured Text data display supports record lengths up to 32,000 bytes.
Caution: If your source file contains Carriage Returns, Line Feeds or Form Feeds, use the Character Filters tab to replace each of these with blanks before using the number of bytes Line Separator. For information on using character filters, see Character Filters.
If the line separator is not one of the choices from the list box and is a printable character, highlight the CR-LF and then type the correct character. For example, if the separator is a pipe ( | ), type the pipe character on the keyboard.
If the line separator is not one of the choices from the list box and is not a printable character, highlight the CR-LF and then enter a backslash ( \ ), an X, and then the hex value for the correct separator. For example: Enter \X0C to specify a form feed. See Decimal and Hexadecimal Values.
The line separator can be more than one character. If so, type the characters, or type a backslash X and the hex values for the characters.
Example 1: EOL (for the three capitol letters E, O, and L together)
Example 2: \X0C \X0D (for form feed, carriage return)
Note: Because a majority of files containing CR-LF, CR or LF line separators often include a combination of those three separators, if you select CR-LF, CR, or LF as your line separator, Data Parser for Unstructured Text uses a loose definition of line separator to include solo LFs, CRs, or CR-LFs as well. All other line separators look only for the line separator you specify.
The field separator selected here tells the Data Parser for Unstructured Text how to automatically distinguish fields when analyzing a line of text. The field separator applies to the entire text file on which you are working. If you need to define a data field based on Relative Field Position, select the correct field separator before defining those fields.
The field separator choices follow:
| Field Separator | Description |
|---|---|
| Asterisk ( * ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual words when they are separated by an asterisk. |
| Colon ( : ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual words when they are separated by a colon. |
| Comma ( , ) | Selecting this option tells Data Parser for Unstructured Text to distinguish individual words when they are separated by a comma. |
| Dash ( - ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a dash. |
| Caret ( ^ ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a caret. |
| Semicolon ( ; ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when a semicolon separates them. |
| Slash ( / ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a slash. |
| Space ( ) | A space is the default setting. This option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a single space (hex 20). |
| Tab | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when a tab (hex 09) separates them. |
| Tilde ( ~) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual words when they are separated by a tilde. |
| Vertical Bar ( | ) | Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when a vertical bar separates them. |
| Other | If the word separator in your report is not on the list of selections, you can highlight the current value and type in any single character. If the character cannot be typed, you can enter the hex value for that character with \X in front of it. For example: \X0A will enter a Line Feed as the word separator. |
The options listed under Printer Emulation are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.
These selections should be made prior to opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.
Printer Emulation gives the Data Parser for Unstructured Text the ability to read print control characters. This allows the Data Parser for Unstructured Text to display the report with the same appearance it would have had if printed.
None is the default setting. The Data Parser for Unstructured Text reads and interprets the ANSI characters between and including 32 - 126.
Printers that the Data Parser for Unstructured Text can presently emulate include AS400 and 1403. Other printers will be added to the list in future releases.
If your text file has embedded tab characters representing white space, you can expand those tabs to a set number of spaces. The default value is eight (8). To change the value, highlight the default and then type in the desired value.
If the Tab Expansion is set to 0, then tab characters in the source file will not be replaced with spaces. The tab characters, themselves, can then be used as field separators, column separators, begin or end tags, or other uses. If Show Tab Symbol is selected in the Display Choices Tab of the Source Options window, a symbol will display everywhere a tab character exists in the source text or report file.
The options listed under Character Set are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.
These selections should be made before opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.
Select the Code Page Translation Table that the Data Parser for Unstructured Text will use when reading an EBCDIC report file.
Tip: If you change ANSI to EBCDIC and then back to ANSI, the data window appear to lose data, but this is a display issue resulting from use of different default line separators in the two character sets. To return the display to readable characters, in the File Properties tab pane, change the line separator CR-LF to CR, click OK, then reopen the settings and change the line separator back to CR-LF.
The options listed under Character Filters are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.
These selections should be made prior to opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.
You can choose individual characters to filter by clicking in the cell next to that character and selecting delete, leave alone or replace with space from the list.
Note: The character with decimal value 159 affects the line display. Filter this character while designing your script.
If you have selected some characters to filter and want to set those characters back to the original settings, click Reset Defaults.
Note: All but two characters default to leave alone. 00 (NUL) characters default to replace with space and 0C (FF) characters default to delete.
If you want to have all nonprintable characters other than carriage return-line feed replaced with spaces in the display, click the Filter Non-Print button. Filtering can be useful if you have unusual characters that interfere with reading the file correctly or with the ability of the script to recognize certain lines.
When you have finished making your selections on the tabs in the Source Options window, click the OK button to save your selections.
In order to read a report into the Data Extractor, it must exist as a disk file on your PC or network drive to which you have access/permissions. The following section includes suggestions for accomplishing this.
If the report was generated on a mainframe, it must be spooled to a file and downloaded to your PC or network using a communications package. The report may be in ASCII or any standard EBCDIC format.
If the report was generated in a PC application, you may do either of the following:
You can now print to a disk file from any Windows application by selecting the Generic/Text printer.
The Data Extractor offers dramatic source flexibility using URI support. The full addressing scheme must be present before the Data Extractor can connect to the source file.
Note: A minimum of v1.4.0 of Sun Java Runtime Environment (JRE) must be installed for the URI to function properly. Without this component, you will receive the error message: "Unable to load Java virtual machine."
For more information on URIs within the Data Extractor, see Dealing with URI Limitations below.
URIs (Uniform Resource Identifiers) are a modern subset of URLs, that were originally nothing more than HTTP and FTP locations on the Internet. A URI typically describes the following:
For example, this URI...
http://www.yahoo.com
...identifies a file that can be accessed using the Web protocol application, Hypertext Transfer Protocol ("http://"), that is housed on a computer named "www.yahoo.com" (which can be mapped to a unique Internet address). In the computer's directory structure, the file is located at "/support.asp".
File Transfer Protocol addresses and e-mail addresses are also URIs (and are also called a URL).
The common syntax for scheme-specific data is:
file://
Some or all of the parts "
| Component | Description |
|---|---|
| user | An optional user name. Some schemes (e.g., ftp) allow the specification of a user name. |
| password | An optional password. If present, it follows the user name separated from it by a colon. The user name and password, if present, are followed by a commercial "at" sign ( @ ). Within the user and password field, any colon ( : ), "at" sign ( @ ), or slash ( / ) must be encoded. |
| host | The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by periods ( . ). |
| port | The port number to connect. Most schemes designate protocols that have a default port number. Another port number may be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well. |
| url-path | The remainder of the locator consists of data specific to the scheme, and is known as the "url-path". It supplies the details of how the specified resource can be accessed. Note: The slash ( / ) between the host (or port) and the url-path is not part of the url-path. |
URI support is currently limited to "public" WWW sites, where the navigation to the page containing web content you want to extract is directly addressable with one URL hyperlink (e.g., cnn.com). You cannot connect to interactive session resources like TELNET or RLOGIN.
While this URI support is a huge benefit, and reaches hundreds of millions of pages, there are still many URLs that are not directly addressable. You must first go through some painstaking navigation, authentication, looping, etc., before you arrive at the page containing the data to be extracted.
Solutions for overcoming this limitation at design time include:
Pervasive Software Inc.
12365 Riata Trace Parkway
Building B
Austin, TX 78727 USA
Web: http://pervasivedatatools.com
This manual leads you through the operation of the Data Inspectors user interface. The Pervasive Data Inspectors allow you to analyze data in dozens of databases, applications, or file formats. You must have a non-expired Data Inspector license to run each product.
Refer to the license.txt file in the default installation directory for disclaimers and information about trademarks and credits.
The installation and operation of the Pervasive Data Inspectors requires the installation of Pervasive DataTools v9. After you order a trial or a license subscription for one or more of the Data Inspectors, follow the steps below, and in the order written, to install DataTools v9 and Data Inspector.
Follow ALL steps in the System Requirements and Installation Pre-requisites document. Do not install DataTools until/unless all of the specified requirements and pre-requisites are performed and/or verified.
The DataTools v9 installation EXE file can be downloaded from the Product Downloads page of the DataTools web site.
Download the installation file to your workstation's desktop.
Click on the DataTools v9 installation EXE file to start the installation wizard. Follow the prompts until the installation is finished. The installation may take several minutes.
Launch Pervasive DataTools v9. When prompted, enter your login credentials that you used on the Pervasive DataTools web site to order your DataTools products.
This step is a good way to verify that you have installed DataTools v9 properly before installing Data Inspectors. If DataTools v9 will not start and run, do not proceed to the Data Inspector installation until the DataTools v9 installation is completed and DataTools v9 runs properly on your workstation.
The DataTools v9 installation EXE file can be downloaded from the Product Downloads page of the DataTools web site.
Download the installation file to your workstation's desktop.
Click on the Data Inspector installation EXE file to start the installation wizard. Follow the prompts until the installation is finished. The installation may take several minutes.
After installing Data Inspector, launch DataTools v9. The DataTools Launcher should find your Data Inspector license file via our web service and prompt you to download the file to your workstation.
Once the Data Inspector license file has been downloaded to your workstation, highlight the Data Inspector product in the DataTools Launcher's grid and then click the Launch button located under the grid. The Data Inspector should launch.
The Data Inspectors are simple data quality analysis tools. Each Data Inspector provides an easy-to-use method of analyzing a data source.
Data Inspectors perform the following actions:
Each Data Inspector includes the following features:
Simple three-step workflow:
Predefined metrics return the results in tabular and graphic formats.
The Data Inspector Design Preferences dialog box enables you to change some of the design time preferences for the following items:
The Autogen Defaults tab enables you to customize the Row Sample Size and Distinct Value Field limit.
By default, Row Sample Size is set to 10000 and Distinct Value Field limit is set to 15. This default set up is to limit the processing time and memory required to set up and run an analysis.
The Fonts tab enables you to customize the fonts used in the Data Inspector user interface.
The Logging tab enables you to rename the log file name. By default, the log file name is Data_Investigator.log.
Data Inspectors can analyze data from many sources. Each Data Inspector supports a specific database, application, or data file format. Before you can analyze data from a particular source, you must identify the source connection type and connect to a database table, application entity, or data file.
To connect to the source data, you must specify the following:
The next few pages of this manual provide details for each.
The only source connections that are available in the Data Inspectors are those whose structure is pre-defined by the existence of complete metadata. In other words, if you wish to analyze the data from a Microsoft SQL Server database table, there is a Data Inspector available. However, if you wish to analyze data from a report file or from a fixed ASCII, Btrieve or C-ISAM data file, you must first use one of the Pervasive Data Parsers to parse, unpack, and export the data to a CSV text file format.
Before you design your data analysis specification, you must connect to a data source. The following sections will guide you through the steps and the available options.
The Select Connection dialog box displays the following three tabs:
The selected connection tab provides a list of available connections in the specific Data Inspector for which you have a license.
Alternatively, double-click the correct source connection to apply the current source connection. Data Inspector returns to the previous screen with the Source Connection tab open and the correct source connection listed in the Source Connection box.
The fields that are displayed depend on the type of connection you select. However, regardless of the connection type, you should always see the following fields:
The label may vary based on the connection name; however, it is usually labeled Source File/URI.
Click the drop-down list to navigate to the source file.
To Connect to the Source Data
Data Inspector enables you to provide the source file / URI in any of the following ways:
The steps required for these tasks varies according to the combination of source connection, data source, and profiling needs.
For more information on editing the connection properties, see Managing the Source Connection.
In addition to the connect information on the left side of the Source Connection tab, there is a list of connector properties in the right panel. Not all files with the same connection have all of the same connection properties, and it may be easier to write a profiling specification if some of the properties are changed.
For instance, some ASCII-delimited files have a header row giving field names, while others do not. The default properties for the ASCII (Delimited) connector assume that there is no header row. If the file does have a field name header, changing the Header property so that Data Inspector will pick field names up makes defining a profiling specification easier. For the case of the Header property, double-clicking False changes the value to True. This lets Data Inspector read the field names from the header, so it uses actual field names (Name, Address, and so on) instead of the default field names (Field1, Field2, and so on) in setting up profiling metrics.
For some properties, a drop-down list arrow appears in the value cell. Other properties may require you to specify a file or enter a text value.
NOTE: After making a change to one or more of the connection property options, if you do not click the Apply button, the settings revert to the previous value before your change was made.
You can reuse the saved connection by selecting it in the User Defined Connection window.
By default, the field list will offer only the first 15 data fields from your data source.
To change the number of visible fields, follow these steps:
The Data Inspector will run an analysis on the selected data field(s) and will return a table and a pie chart of Distinct Values on the field or fields.
By default, the Data Inspector will run the Distinct Values analysis on the selected fields of the first 10,000 data records.
To change the number of data records on which to run the analysis, follow these steps:
Note: The Distinct Values analysis requires significant amounts of processing. The selection of a large number of data fields in a large data set may significantly increase the time to completion.
After selecting the desired data fields and data records on which to run the Distinct Values analysis, click on the Run Profile button in the button bar.
After connecting to the data source and selecting the desired data fields and data records on which to run the Distinct Values analysis, click on the Run Profile button in the button bar.
Depending on the number of data fields and data records you chose to analyze, the analysis will take a few seconds to a few minutes to complete. Note: The Distinct Values analysis requires significant amounts of processing. The selection of a large number of data fields in a large data set may significantly increase the time to completion.
When the analysis is complete, a table and a pie chart showing the results will display on the Results tab.
After the analysis is complete, the Results tab will display two tables (on the left) and a pie chart (on the right). The following explains how to read the results.
The information displayed in the upper table of the Results tab is the result of the Distinct Value analysis of the selected data source.
You will see at least two rows of information listed in the upper table. The first row contains the number of records that were analyzed in the data source. The second row contains the field name of one of the first source data field that was analyzed and the Distinct Count for that field.
The Distinct Count value reflects the number of distinct values that occur in the data records for each selected data field.
Note: Both the "Field / Column" and "Distinct Count" columns can be sorted in ascending and descending order by clicking on the yellow column heading in the upper table.
The lower table of the Results tab displays a list of the Distinct Values for each data field as you click on each row in the upper table.
Note: Each of the three columns can be sorted in ascending and descending order by clicking on the yellow column heading in the lower table.
Along with the tables describing the results, a pie chart is also displayed on the Results tab. This pie chart provides a visual representation of the statistics as you click each row in the upper table.
Note: If the Distinct Values results display as very narrow sections in the pie chart, you may want to maximize the size of the Data Inspector window for better viewing. The splitter bar between the tables and the pie chart can also be adjusted left and right to re-size each side of the Results tab.
After running one analysis, if you want to "start over" and run another analysis, follow these steps:
After designing and running a Distinct Values analysis in the Data Inspector, you may save the analysis by following these steps:
To open a saved analysis, follow these steps:
Pervasive Software Inc.
12365 Riata Trace Parkway
Building B
Austin, TX 78727 USA
Web: http://pervasivedatatools.com
This manual leads you through the operation of the Data Joiner user interface. The Pervasive Data Joiner allows you to join two heterogeneous data sets and export the joined results set to a CSV Text file. You must have a non-expired Data Joiner license to run this application.
Refer to the license.txt file in the default installation directory for disclaimers and information about trademarks and credits.
Data Joiner is an application used to join heterogeneous data. It allows you to do a full (or outer) join, an inner join, a left join or a right join. The Data Joiner user interface consists of three tabs for specifying source information and a run button for exporting the joined result set to an ASCII or Unicode delimited text file.
This application allows you to join only two source data sets. However, each source can be data from a file, a table, a view, or a SQL statement. When a source data set is described using a view or a SQL statement, the view or SQL statement itself could be joining multiple tables from the same database.
To successfully export joined data, you will first have to connect to Source 1, connect to Source 2, specify the keys to join on, and which fields from each source you want in the resulting data set. The Data Joiner allows you to browse each of your source data sets, test the join results and browse the exported result set.
You can access Join Designer preferences by selecting the View > Preference menu item. The preferences dialog allows you to set the application font and set a description for a join before you save the join design.
For each task, we have provided a quick link to the section of the Data Joiner documentation that describes how to perform the specific task.
| Task | Help Topic |
|---|---|
| Define a Join Source | Go To Source 1 Connection |
| Browse Source Data | Go To Browsing Source Data |
| Set Join Properties | Go To Join Settings |
| Test a Join Design | Go To Testing the Join Design |
| Export Joined Data | Go To Exporting Joined Data |
| Browse Exported Data | Go To Browsing Exported Data |
| Build a Query Statement | Go To Using the Query Builder |
| Create a Reusable Source Connection | Go To Creating a ReUsable Source Connection |
| Save a Join Design | Go To Saving the Join Design |
The first Data Joiner tab is for specifying one of your two sources. You can think of it as the left side of a join.
Choose a source connector type by clicking on the down arrow to the right of the Source1 Connection textbox. The second tab of the connection selection dialog that pops up will display a list of Factory Connections. Once you have created a couple of designs, the first tab will display connections that were recently selected or used. The third tab of this dialog allows you to choose previously created User Defined Connections. See Also: Creating ReUsable Source Connections.
Once you have selected a connector type, you’ll need to fill in the remaining source connection information. The connection information fields on Source1 Connection tab will vary depending on the type of connector selected. If your source connector is a SQL type source, you have the options of specifying a database table, an SQL statement, or an SQL file which is a file containing an SQL statement. For information on using the Query Builder to create an SQL statement, see Using the Query Builder.
You may also need to change the default connector properties in the grid to the right based on the particular source data you are trying to connect to. After changing any of the connector properties, press the Apply button to apply those changes.
After you enter enough information about your source connection, the application may automatically connect to the source. If it does not, press the connect button. You will know you are connected to the source data when the connect button is disabled and the connect arrow on the Source1 Connection turns solid.
Data Joiner tab two is used to specify the connection information for your second source. You can think of this as the right side of a join. Select a connector type and fill in all appropriate information that allows a successful connection to the data.
Refer to Source 1 Connection (Tab 1) for details.
Data Joiner tab three is used to specify additional join information about both sources. In particular, the join engine needs to know if your source data is already sorted. If it is not, the join engine will sort the data first. This will add additional processing time, especially for large data sets.
You can specify which fields in both sources you want in your joined results data set. You do not have to include all fields in the results. If source 2 contains a field with the same name as a source 1 field, the application will automatically enter an alias name for that field. You can change the alias name. The result set will contain the source 1 field name and the alias name for the source 2 field to avoid duplicate field names in the joined result set.
Finally, you will need to specify the key(s) for both sources. The keys will be used to join the data on the specified key fields.
Once you have successfully connected to a source data set, you can browse the data by selecting the View->Source1 Data or View->Source2 Data menu items or by clicking on the browse toolbar button drop down and selecting Browse Source1 Data or Browse Source2 Data. These menu items will only be enabled when a source connection has been made.
At any time in the join design process, you can validate the join by pressing the validate toolbar button to see if you have included enough information to successfully join the two source data sets.
Upon successful validation of a join design, you can test the join by selecting the View>Joined Data (Test) file menu item or the browse toolbar>Browse Joined Data (Test) submenu item. Testing the join will do the following:
Depending on the size of the source data sets as well as whether or not the source data sets need to be sorted, this could take some time.
Click the Export Joined Data toolbar button or select the File>Export Joined Data menu option to export the joined result set.
An export dialog will be displayed allowing you to choose the export target connector type and file location as well as set any target connector properties. In particular, if you want the first record in your target file to be a list of the result set field names, you’ll want to set the Header property to True. You can also choose to export a subset of the joined record set either by choosing to export a sample number of joined records or by setting an export filter for exporting only certain joined records.
When you have finished entering target export information, press the Export Joined Data button to export the data. A run status dialog will be presented. Depending on your source data set size and whether or not they are sorted, running the join and exporting the data can take some time.
Once the export is complete, you can browse the exported data file by selecting the View->Exported Data menu item or the Browse toolbar button > Browse Exported Data sub menu.
Exporting joined data performs a data transformation from the joined source to the export target. Statistics about this transformation will be logged to a file. Statistics include number of records read and number of records written. It will also include any errors that may have prevented the transformation from occurring. You can view the log file by selecting the View->Log File (DataJoiner.log) menu item or by clicking on the browse toolbar button and selecting the Browse Export Log File (Data Joiner.log) sub menu item.
If you find yourself using the same source in multiple join designs, you can enter the source connection information once, create a user defined connection, then select that user defined connection in future designs.
To create a user defined connection out of Source 1, fill in all the appropriate connection information on the first tab. Set whatever properties are needed for the connection as well. After browsing the source data to verify that the connection information is correct, press the save toolbar button that’s to the right of the source connection textbox (top right).
A dialog will appear asking you to name the user defined source connection. Connection files have a .sc.xml extension.
Once you have saved a user defined connection, you can use it in any future Join Design by clicking on the down arrow to the right of the Source1 Connection textbox, going to the third tab of the Select Connection dialog and choosing your saved connection. Tab 1 will automatically be populated with all connection information that was stored as a part of your user defined connection.
When creating a SQL statement to use for a source connection, you can use the Query Builder. You can access the Query Builder by selecting the Query Statement option and pressing the Build Query button.
For documentation on how to use the Query Builder, see Appendix A.
Click the Save Join toolbar button or select the File->Save menu item to save a join design.
Saving a join design saves a number of files needed by the join engine. Your saved join will consist of a JoinName.jd.xml file, JoinName_Source1.sv.xml and .svf.xml files, and JoinName_Source2.sv.xml and .svf.xml files. All of these will be saved a default workspace and repository created by Data Joiner.
The Integration Query Builder consists of following five panes:
In all cases when you have specified a valid connection and entered the required information, a new connection node is added in the upper-right Connections pane and you can to click this in future to open this connection.
The following SQL functions are supported in the Integration Querybuilder:
Highlight multiple rows in the Columns pane, and select the appropriate function from the menu. The selected fields are converted to string data type and concatenated as defined in the following table.>p>
Highlight a row from the Columns pane and select the appropriate function from the menu. The selected field is converted to string data type and formatted as described below:
Aggregate functions return a single value based on a set of other values, such as the sum of these values, including the maximum and minimum. The aggregate function can be applied to all the values in a particular field in the query, or to groups of these values by applying the Group By clause.
The aggregation functions available depend on the SQL variant and syntax used by the data source. For a full description of each function, refer to the vendor's documentation.
Other than COUNT, most aggregation functions are designed to work with numeric fields and unexpected results can occur if you try to aggregate strings.
String functions operate on single text strings and are used to obtain and manipulate characters or character information, such as extracting a substring of one or more characters or trimming blank spaces from a field.
The functions available from this menu depend on the SQL variant and syntax used by the data source. Refer to the vendor's help or documentation for a full description of these functions.
Numeric functions operate on single numeric fields and are used to obtain simple mathematical information, such as the sign of a number, the next highest integer or its absolute value.
The functions available from this menu depend on the SQL variant and syntax used by the data source. Refer to the vendor's help or documentation for a full description of these functions.
Date Part functions operate on date fields and are used to extract and convert a portion of a date into a character string such as day of week, day of year, quarter, hour and minute.
The functions available from this menu depend on the SQL variant and syntax used by the data source. Those shown may only be the most commonly used of those available. Refer to the vendor's help or documentation that was provided for a full description of these functions.
Date Name functions operate on date fields and are used to extract and convert a portion of a date into a named date element such as month name or weekday name.
The functions available from this menu depend on the SQL variant and syntax used by the data source. Refer to the vendor's help or documentation for a full description of these functions.
Conversion to Character functions convert numeric fields to text fields, then apply the selected formatting to enable different formats for currency and dates and times.
The functions available from this menu depend on the SQL variant and syntax used by the data source. Refer to the vendor's help or documentation for a full description of these functions.
Conversion to Number functions convert numbers held as text strings to their numeric values. Strings containing non-numeric values will be converted to 0.
The functions available from this menu depend on the SQL variant and syntax used by the data source. Refer to the vendor's help or documentation for a full description of these functions.
Conversion to Date functions convert numeric fields to their date equivalent in the selected format.
The functions available from this menu depend on the SQL variant and syntax used by the data source. Refer to the vendor's help or documentation for a full description of these functions.
This button inserts code that will deal with a field containing a NULL value. Different SQL data sources will handle this differently, but should allow you to substitute some other value into the field if a NULL value is encountered. For example, in Access it might be:
IIf(IsNull(Fld), vNullCaseValue, Fld)
Whereas in Oracle it might be:
NVL(Fld, vNullCaseValue)
In both cases you replace vNullCaseValue with the value to be displayed (or further processing to be performed) if NULL is returned.
This button inserts code that deals with a simple either/or condition. Different SQL data sources handle this differently. For example, in Access it might be:
IIf(Fld = when_expression_1, result_expression_1, else_expression)
Whereas in Oracle it might be:
DECODE(Fld, when_expression_1, result_expression_1, else_expression)
Once you have connected to a data source, your connection is displayed in the upper-right pane. You can set up and save as many data source connections as you need. Integration Querybuilder stores all connections you create unless you explicitly delete them.
To create new queries on the Relations pane:
When two tables each contain a column that shares a common set of values, they can be linked on this column by a join. For example, a table of orders and a table of customers could be linked if both contain a Customer ID column.
To create a join between two tables in the Relations pane:
You can change the join to any of the following:
You can also create joins that are based on other logical conditions than the two selected columns being equal. Do this by right-clicking the diamond and selecting Properties. This is not recommended unless there is no other way to create a query and you know exactly what you are doing, as it can produce an extremely large number of records.
The Columns pane allows you to select the fields you want to use in your query. If field names are more than one word long they are displayed in the column cells surrounded by square brackets.
In the column list, fields are listed by the table they appear in and names are not bracketed.
An asterisk (*) is used to indicate all fields, so Employees.* means all fields in the Employees table.
To select a field in the Columns pane:
The value you use in a Criteria cell can be preceded by one of the following operators: "=","<", ">", "<="," >=", "!<", "!>", "!=", "<>", "IS", "IS NOT", "NOT", "IN", "LIKE", "BETWEEN". If no operator is entered, the default is "=".
Values following the LIKE operator will be put into quotes unless already so. The LIKE operator enables you to use regular expressions.
The wildcards and syntax available for use in regular expressions will depend upon your source database, so you should refer to the documentation supplied for further information. They should not be confused with those used in RIFL regular expressions.
Each time you define a criterion, another criterion cell opens, enabling you to set up a string of OR conditions for this column. For example, to select records where the Employee ID is less than 10000, between 10500 and 10750 or greater than 20000:
Which will result in the following SQL clause:
OR criteria are created horizontally, whereas AND criteria vertically. For example, to select records where the Employee ID is greater than 10000 and the Last Name is not Smith:
Which will result in the following SQL clause:
To set up an OR criteria that tests two separate fields, do the following:
Which will result in the following SQL clause:
To combine OR and AND criteria, remember that criteria are grouped with the criteria in the same column. As in the following example:
[Employee ID] > 10000 is only used with [Last Name] = "Smith". Records will be returned when the [Last Name] ="Jones" regardless of employee ID.
To return all records where the customer ID is greater than 20000 and the company name is Smith, Jones or Featherstone you will need to set up the following criteria:
Integration Querybuilder's SQL pane displays the SQL statement as you create it using the Relations and Columns panes. However, it can also be used by more experienced users as a SQL editor to create queries that cannot be put together simply using the Relations and Columns panes.
Caution: If you are not familiar with writing SQL by hand, do not edit the code Querybuilder generates, as changes made in the SQL pane override the settings you created in the Relations and Columns panes, and you might have to begin the query again.
To use SQL functions:
If the syntax is valid, a new line containing this function is created in the Columns pane. Enter a short, meaningful name for it in the Alias cell. If the syntax is invalid an error message displays. The Relations and Columns pane is grayed-out until the SQL code is corrected and validated again.To manually define SQL:
If you are an experienced SQL user, type query statements directly into Integration Querybuilder using this pane as an editor and the Check SQL Syntax button to validate the code as you build it. You can also paste text into the SQL pane from external programs, online help and e-mails.
If you copy existing material into the SQL pane, you must remove any comments, as these will cause an error when the query is exported to Map Designer.
If you set up a join conditions in the SQL pane rather than in the Relations pane and the join condition has an expression instead of a field on one side, the join line (in the Relations pane) will is drawn from a field of one object to the title bar of the other. The diamond icon will display "f*". If both sides of the join condition are expressions rather than fields the join line is drawn between the title bars of both objects.
Querybuilder does not validate or run any query containing DJX. Test and validate your statement using dummy information and substitute the DJX code in Map Designer's Query Statement box afterwards.
To run the query:
The Result tab displays showing the result of this query. If the query takes a noticeable time to execute, the Stop button becomes active so that the query can be canceled if required.