Source Options Window

The Source Options window opens each time you begin a new extract script, unless you go to the Display Choices tab and clear the Display Source Options with New Extract check box. Otherwise, to open this window, click the Source Options button in the Tool Bar, or select Options from the Source menu. The purpose of the Source Options window is to allow you to change the way the Data Parser for Unstructured Text reads your text file.

If you are familiar with the text or report file with which you are working, the Source Options window can be opened and some selections made before opening the file. Other options can be changed to meet the requirements of the report file as you are parsing it.

If you make changes to the settings in this window after opening the report file, the Data Parser for Unstructured Text may reread and reload your file. This may take a few seconds.

The window is divided into seven tabs: Extract Design Choices, Display Choices, File Properties, Printer Emulation, Character Set, Character Filters and External Viewer. The options in each tab are discussed below:

  • Extract Design Choices
  • Display Choices
  • File Properties
  • Printer Emulation
  • Character Set
  • Character Filters
  • External Viewer
  • Extract Design Choices

Extract Design Choices

This topic covers the settings under Extract Design Choices.

Tag Separator

The tag separator selected here tells the Data Parser for Unstructured Text how to distinguish a field tag from the data field when analyzing a line of text. This is only relevant when you are making use of the Parse Tagged Data shortcut menu option.

The tag separator choices are:

Tag Separator Description
ColonSpace (: ) Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a Colon and a Space. Example: Name: John M. Smith
Colon (:) This is the default setting. This option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a colon only. Example: Name: John M. Smith
SpaceColon ( :) Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a Space and a Colon. Example: Name: John M. Smith
Dash (-) Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a dash. Example: Name-John M. Smith
Comma (,) Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a comma. Example: Name, John M. Smith
# of Spaces Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a specified number of spaces. When you select this option another box appears to the right of the tag separator box in which you type the desired value to specify how many spaces. Example with 3 spaces specified: Name John M. Smith
# of Spaces + Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by two or more spaces. Example: Name John M. Smith
Vertical Bar ( | ) Selecting this option tells the Data Parser for Unstructured Text to distinguish a field tag from the data field by a vertical bar or pipe ( | ). Example: Name|John M. Smith
Other If your report file has a character that is not on the list of choices separating the field tags from the field data, you can highlight the character shown and type in any single printable character. Example: Name*John M. Smith

For details on tagged report data, see Define Data Fields.

Column Separator

The Column Separator selected here tells the Data Parser for Unstructured Text how to distinguish one column of data from the next when analyzing a line or block of text. This setting is only relevant when you are using the Parse Columnar Data or Parse Columnar w/ Heading shortcut menu options. Each column within a line of text will become a data field.

The Column Separator choices are:

Column Separator Description
(2+) Spaces This is the default setting. This option tells the Data Parser for Unstructured Text to distinguish between two columns of data by two or more spaces. Example: 10019 John M. Smith. The account number 10019 is one column, or data field, and the person's full name, John M. Smith, which contains single spaces, is another column, or data field.
(1) Space Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a single space.
Tab Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a tab.
Vertical Bar ( | ) Selecting this option tells the Data Parser for Unstructured Text to distinguish between two columns of data by a vertical bar or pipe ( | ).
Other If your report file has a character that is not on the list of choices separating columns of data, you can highlight the character shown and type in any single printable character. Example: 10019 - John M. Smith

Flush Field Contents on Accept default

The Data Parser for Unstructured Text outputs each record type as a fixed length, consistent structure record. This means that even if a field or line does not exist in a particular section of a report, that field will still exist in the output record. Propagate and Flush are opposite ways of handling these fields that contain no data or do not exist.

Propagate Field Contents is the normal default. It causes the Data Parser for Unstructured Text to carry the field contents forward from the last section that contained data in the data field when the report contains no data in the section being processed. This is the default and is particularly useful for header information that you want repeated until a new header is encountered.

Flush field contents is the opposite of propagate field contents. With the Flush Field Contents on Accept default check box checked, fields that do not exist or are blank in the report will be blank in the output by default. This only holds true for any fields defined after this setting is made. Previously defined fields will not have their Data Collection/Output options changed.

Individual fields can still be set to Propagate Field Contents on the Data Collection/Output tab of the Field Definition window.

Do Accept at End

This option is necessary when you have defined your ACCEPT Record line as Accept Before Collecting. This is usually done when there is no line that can be consistently identified as the end of a record, but there is a consistent beginning line. Since the Data Parser for Unstructured Text writes out a record as soon as it encounters the ACCEPT Record line, the very last record in a report with Accept Before Collecting will not be written out unless this option is checked. For more information, see Line Action.

The settings in the Source Option window for Do Accept at End are the default settings for future Accept records. Select the Do Accept at End Default checkbox (optional) the line styles will do an Accept at End. You can override the default settings, by selecting the Do Accept at End checkbox in the Line Action tab of the Line Style Definition window. Values can be set individually for each Accept record.

Skip First Accept

When your report does not contain a consistent last line, you may need to define the ACCEPT Record as the first line of a header or some other consistent line of text that appears at the beginning of each section of the report. This line would have the ACCEPT Record Before Collecting line action. When this line action is chosen, the resulting first record will contain no data. It will be blank. To avoid a blank first record in your output file, turn Skip First Accept ON by checking the box. See Line Action.

The setting in the Source Options window for Skip First Accept Default is the default setting for future Accept records. If you select the Skip First Accept Default checkbox, the Accept line styles you define will skip the first blank record in the output file. You can override the default settings by selecting the Skip First Accept checkbox in the Line Action tab of the Line Style Definition window. Values can be set individually for each Accept record.

Trim Leading and Trailing Spaces

When this option is checked, the Data Parser for Unstructured Text will strip your report fields of all leading and trailing spaces. This is especially useful for parsed columnar fields where you have a set field length, but have no real idea how large each piece of data will be. For details, see Shortcut Menu - Line Style Column.

Comparisons with Numbers

This setting affects how the greater than, less than, and other comparison operators in the Line Style Definition window Operator column work. With the Numeric Comparison radio button selected, any input data will be checked to be certain it is numeric and then compared as a number. With the String Comparison radio button checked, the input data would be compared as a string, regardless of what sort of data it is.

Display Choices

The settings on this tab tell the Data Parser for Unstructured Text:

  • What portion of your report file to display
  • How to display certain characters
  • Whether or not to "pad" each line of text with spaces to its maximum width
  • Whether or not to add graph lines to the display

Each of the options is described below.

Source Sample

The purpose of these settings is to allow you to select a representative sub-sample of a large text file when defining the line styles and data fields. Using a sub-sample will greatly improve the speed at which you can work.

Remember that this is only for display purposes. After you have defined your line styles and data fields in the selected sub-sample, the Data Parser for Unstructured Text still extracts the data from the entire report file. Also, the Record Browser Window previews the extract from the entire file.

When selecting a subsample for Data Parser for Unstructured Text to read, be sure that it is representative of the most complete records in the file. The available options are:

Starting Line

By default the Data Parser for Unstructured Text starts displaying the text file at the beginning of the first line in the file (line 1). However, when selecting a representative subsample of the file, you may want the Data Parser for Unstructured Text to start reading the file on some other line.

You may highlight the default value (1) and type a new value, representing the line number from which you want Data Parser for Unstructured Text to start displaying data. Use the Cursor Position Box to help you determine the line number. For example, if you want the Data Parser for Unstructured Text to start displaying a report file at line 9, rather than line 1, simply highlight the 1, type in a 9.

Ending Line

By default the Data Parser for Unstructured Text displays the first 500 lines of the text file, beginning at line 1 and ending at line 500. The Data Parser for Unstructured Text can display many more lines of text. However, the more lines displayed in the data panel the more slowly the Data Parser for Unstructured Text will function. Also, when selecting a representative sub-sample of the file, you may want the Data Parser for Unstructured Text to display a smaller subset of lines in the file.

You may highlight the default value (500) and type a new value, representing the line number at which you want the Data Parser for Unstructured Text to stop reading. For example, if you want the Data Parser for Unstructured Text to stop reading a report file at line 35, rather than line 500, simply highlight the 500, and type 35.

Click OK in the Source Options window when you are finished making your selections.

Sample Size

As you change the settings in Starting Line or Ending Line, the Sample Size value will change to reflect the total number of lines of text that the Data Parser for Unstructured Text displays. Performance will be affected as the Sample Size increases.

Padding

Pad Lines

This option allows you to choose whether the Data Parser for Unstructured Text pads each line of text in your report out to a fixed right margin, or reads each line only to the last character on that line. The default setting is OFF. You can turn it ON by clicking in the white square to the left of Pad Lines. This removes the check in the box.

When a report has a ragged right margin, this option will assist you in defining data fields to a width that does not cause the data in wider fields to be truncated. It can be misleading, however, if you are defining line styles. The spaces used to pad the lines in the display do not actually exist in the text file. Do not use them in recognition patterns for line styles.

Pad Line Length

When the Pad Lines option is turned ON, the Data Parser for Unstructured Text will pad each line of text in the selected sample out to the longest length necessary to accommodate the data. That value is displayed in Pad Length. For example, in TUTOR1.REP, the default Pad Line Length is 52.

If you want the Data Parser for Unstructured Text to pad with spaces past the last character in all the lines of text, you may change the value of Pad Line Length. Highlight the default value and type a larger value. Performance will be affected as the value increases.

Symbols

Show EndLine Symbol

This is a toggle that allows you to choose whether or not to display a symbol at the end of each line of text. The default setting is ON. You may turn it OFF by clicking on the box that contains the check mark.

Symbol

If you want the Data Parser for Unstructured Text to display an end line symbol, this is a list of symbols from which to choose. The options are a paragraph mark (¶) or {EOL}. Select the desired choice from the list box.

Show Space Symbol

This is a toggle that allows you to choose whether or not to display a symbol where each space character exists in the display. This includes both space characters that exist in the source text file and also any characters that have been added to the Data Panel by the Pad Lines option. The default setting is ON. You may turn it OFF by clicking on the box that contains the check mark.

Symbol

If you want the Data Parser for Unstructured Text to display a space symbol, this is a list of symbols from which to choose. The options are a small dot ( × ) or a period ( . ). Select the desired choice from the list box.

Show Tab Symbol

This is a toggle that allows you to choose whether or not to display a symbol where a tab character exists in the source report file. The default setting is ON. You may turn it OFF by clicking on the box that contains the check mark. The symbol will only appear if tab expansion is set to 0 on the Printer Emulation tab of the Source Options window.

Symbol

If you want the Data Parser for Unstructured Text to display a tab symbol, this is a list of symbols from which to choose. The options are a small double right angle bracket ( ) or a single right angle bracket ( > ). Select the desired choice from the list box.

Graph Paper

Show Horizontal Lines

This is a toggle that allows you to choose whether or not to display horizontal lines under each line of data in the Data Panel. The default setting is OFF. You may turn it ON by clicking on the box that contains the check mark.

Show Vertical Lines

This is a toggle that allows you to choose whether or not to display vertical lines in between each line of data in the Data Panel. The default setting is OFF. You may turn it ON by clicking on the box that contains the check mark.

Options

Display Source Option with New Extract

This tells the Data Parser for Unstructured Text to automatically show the Source Options dialog box each time a new extract script is created. The default is ON. If you do not want the Source Options dialog box to open when you create a new extract script, turn it OFF click the box containing the check mark.

File Properties

The options listed under File Properties are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.

These selections should be made prior to opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.

The exception is the Text File choice. This will always open with the text file that was used originally to define the script. If you want to change it, you must open the script first.

Text File/URI

This is the report file that you are using, or URI to which you are connecting. If you want to check or run the script that you have designed against a different report file, you can click the down arrow and browse to the new report file. The Data Parser for Unstructured Text will load the new file and apply the script that is presently open to that file.

Line Separator

A text file is presumed to have a carriage return-line feed (CR-LF) at the end of each line. However, some files have different characters at the end of the line.

To specify some other line separator, or to specify the number of bytes of each line of a fixed length report, place the mouse pointer in the Line Separator box and click once. Then click the down arrow to the right of the box and click the desired Line Separator in the list. The list box choices are: carriage return-line feed (default), line feed, carriage return, line feed-carriage return, form feed, empty line, and number of bytes.

If you select number of bytes, enter the length of each data record in the Byte Count box that appears just below the Line Separator box. The Data Parser for Unstructured Text data display supports record lengths up to 32,000 bytes.

Caution: If your source file contains Carriage Returns, Line Feeds or Form Feeds, use the Character Filters tab to replace each of these with blanks before using the number of bytes Line Separator. For information on using character filters, see Character Filters.

If the line separator is not one of the choices from the list box and is a printable character, highlight the CR-LF and then type the correct character. For example, if the separator is a pipe ( | ), type the pipe character on the keyboard.

If the line separator is not one of the choices from the list box and is not a printable character, highlight the CR-LF and then enter a backslash ( \ ), an X, and then the hex value for the correct separator. For example: Enter \X0C to specify a form feed. See Decimal and Hexadecimal Values.

The line separator can be more than one character. If so, type the characters, or type a backslash X and the hex values for the characters.

Example 1: EOL (for the three capitol letters E, O, and L together)

Example 2: \X0C \X0D (for form feed, carriage return)

Note: Because a majority of files containing CR-LF, CR or LF line separators often include a combination of those three separators, if you select CR-LF, CR, or LF as your line separator, Data Parser for Unstructured Text uses a loose definition of line separator to include solo LFs, CRs, or CR-LFs as well. All other line separators look only for the line separator you specify.

Field Separator

The field separator selected here tells the Data Parser for Unstructured Text how to automatically distinguish fields when analyzing a line of text. The field separator applies to the entire text file on which you are working. If you need to define a data field based on Relative Field Position, select the correct field separator before defining those fields.

The field separator choices follow:

Field Separator Description
Asterisk ( * ) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual words when they are separated by an asterisk.
Colon ( : ) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual words when they are separated by a colon.
Comma ( , ) Selecting this option tells Data Parser for Unstructured Text to distinguish individual words when they are separated by a comma.
Dash ( - ) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a dash.
Caret ( ^ ) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a caret.
Semicolon ( ; ) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when a semicolon separates them.
Slash ( / ) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a slash.
Space ( ) A space is the default setting. This option tells the Data Parser for Unstructured Text to distinguish individual fields when they are separated by a single space (hex 20).
Tab Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when a tab (hex 09) separates them.
Tilde ( ~) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual words when they are separated by a tilde.
Vertical Bar ( | ) Selecting this option tells the Data Parser for Unstructured Text to distinguish individual fields when a vertical bar separates them.
Other If the word separator in your report is not on the list of selections, you can highlight the current value and type in any single character. If the character cannot be typed, you can enter the hex value for that character with \X in front of it. For example: \X0A will enter a Line Feed as the word separator.

Printer Emulation

The options listed under Printer Emulation are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.

These selections should be made prior to opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.

Printer Emulation

Printer Emulation gives the Data Parser for Unstructured Text the ability to read print control characters. This allows the Data Parser for Unstructured Text to display the report with the same appearance it would have had if printed.

None is the default setting. The Data Parser for Unstructured Text reads and interprets the ANSI characters between and including 32 - 126.

Printers that the Data Parser for Unstructured Text can presently emulate include AS400 and 1403. Other printers will be added to the list in future releases.

Tab Expansion

If your text file has embedded tab characters representing white space, you can expand those tabs to a set number of spaces. The default value is eight (8). To change the value, highlight the default and then type in the desired value.

If the Tab Expansion is set to 0, then tab characters in the source file will not be replaced with spaces. The tab characters, themselves, can then be used as field separators, column separators, begin or end tags, or other uses. If Show Tab Symbol is selected in the Display Choices Tab of the Source Options window, a symbol will display everywhere a tab character exists in the source text or report file.

Character Set

The options listed under Character Set are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.

These selections should be made before opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.

Code Page

Select the Code Page Translation Table that the Data Parser for Unstructured Text will use when reading an EBCDIC report file.

Tip: If you change ANSI to EBCDIC and then back to ANSI, the data window appear to lose data, but this is a display issue resulting from use of different default line separators in the two character sets. To return the display to readable characters, in the File Properties tab pane, change the line separator CR-LF to CR, click OK, then reopen the settings and change the line separator back to CR-LF.

Character Filters

The options listed under Character Filters are settings that apply globally to the current text or report file. The settings also are written to the script and are stored in the Data Parser for Unstructured Text database.

These selections should be made prior to opening the text or report file, if you know which options to choose. When you change these options after the report or text file is loaded, the Data Parser for Unstructured Text will reread and reload the report. This may take a few seconds.

You can choose individual characters to filter by clicking in the cell next to that character and selecting delete, leave alone or replace with space from the list.

Note: The character with decimal value 159 affects the line display. Filter this character while designing your script.

Reset Defaults

If you have selected some characters to filter and want to set those characters back to the original settings, click Reset Defaults.

Note: All but two characters default to leave alone. 00 (NUL) characters default to replace with space and 0C (FF) characters default to delete.

Filter Non-Print

If you want to have all nonprintable characters other than carriage return-line feed replaced with spaces in the display, click the Filter Non-Print button. Filtering can be useful if you have unusual characters that interfere with reading the file correctly or with the ability of the script to recognize certain lines.


When you have finished making your selections on the tabs in the Source Options window, click the OK button to save your selections.