Data Parser for Unstructured Text - Tutorial 1 - The Basics

Tutorial 1 guides you through the basic steps to create and save a script file in Data Parser for Unstructured Text. Later tutorials are more detailed.

This tutorial presents the fundamental concepts for using the Data Parser for Unstructured Text. It is recommended that you do this tutorial first. The example file is a tagged list, but the procedure is useful regardless of the type of report. The best way to use this tutorial is to print a hard copy so you can follow the sequential steps.

Tutorial Goals

In this tutorial, you will learn:

  • The basic process of creating an extract script
  • How to save the script design
  • New terms located throughout the documentation

Procedure

This tutorial is divided into three sections that should be completed in the order shown.

Define the Line Style - Accept Record

After selecting the tutorial file and setting up basic options, the first step in defining many extract scripts is to determine the line of data that marks the end of a record. In this case, the line with the string "Category:" is the last line of the first record.

After you identify the end of the record, define the line style for that line by marking the information that makes that line unique. In every record in this data file, the last line contains the string "Category:".

  1. Highlight the string "Category:" (including the colon following it).
  2. Right-click anywhere in the Data Panel, (the large white area of the screen) and select Define Line Style > New Line Style.
  3. The Line Style Definition window appears. Notice Data Parser has already formed line recognition rules based on the information you highlighted. It searches for all lines that contain the string "Category:" in columns 15 through 23.
  4. To indicate that Data Parser should accept the record at this point, ending one record and beginning the next, click the Line Action tab.
  5. Select ACCEPT Record.
  6. Click Add and proceed to Define the Line Style - Collect Fields.
  7. The line style name, Category, now appears in the Line Style Column (the yellow column on the left of your screen) to mark that line as matching the Category: Line Style pattern. A bold green arrow displays designating that this is the Accept Record line. Scroll down in the data panel and notice that each line that matches the pattern you defined was automatically marked with the "Category" Line Style.

Define the Line Style - Collect Fields

In the TUTOR1 file, the first line of text that contains pertinent data is the line with the report date "13-Jul-95" (10th line). The dashes (and their positions) in this line make it unique and are likely to remain consistent even if the date changes in later reports.

  1. Highlight the first dash.
  2. Right-click in the Data Panel and select Define Line Style > New Line Style.
  3. The Line Style Definition window appears. Notice that a pattern was created based on what you highlighted. Data Parser looks for any line that contains a dash in column 13.
  4. Type a more descriptive Line Style Name, such as "Report_Date".
  5. Click the Line Action tab and leave the option set to COLLECT Field Contents.
  6. The COLLECT Field Contents option causes any fields defined on this line to be included in the final output. COLLECT Field Contents is the action you want for the majority of the lines in this type of report.
  7. Click Add.
  8. Locate and highlight the string "Problem No:".
  9. Right-click in the Data Panel and select Define Line Style > New Line Style.
  10. The Line Style Definition window appears. Notice that Data Parser generated a Line Style recognition pattern based on the highlighted string "Problem No:". Data Parser also used the string "Problem_No:" to name the Line Style. You may rename the Line Style if you wish. Data Parser automatically selects COLLECT Field Contents as the line action. Since this is the option you want on most of the lines in this report, accept the default.
  11. Type a line style name.
  12. Click Add.
  13. Repeat steps 6 through 9 for each of the remaining lines in the first record.
  14. Remember, the "Category" Line Style has already been defined, and it is the Accept Record line.
  15. Proceed to Define Data Fields.

Define Data Fields

After defining line styles for 14 lines of the first record, define the Data Fields. You have given Data Parser the pattern information it needs to identify the lines in the report, now define what part of each line you consider to be useful data.

  1. Locate the line containing the date of the report.
  2. The line shows only the report date, so all of the text on that line is important.
  3. Highlight the entire date.
  4. The highlighted text is 1 row by 9 columns. The column and row numbers show at the bottom right part of the screen. Columns 11 through 19 on row 10 contain the date.
  5. Right-click in the Data Panel and select Define Data Field - New Data Field. The Field Definition window appears.
  6. Notice the Field Definition option is set to Fixed Column in both the Start Rule and End Rule tabs. The Data Field starts in column 11 and ends in column 19, exactly where you highlighted.
  7. The Field Name defaults to "Report_Date_1" indicating that this is the first field on the Report_Date line. Change the default name to a more descriptive name, Report_Date, by typing it in the Field Name box.
  8. Click Add.
  9. Define the remaining Data Fields:
    1. On the Problem_No line, highlight from column 25 to 30.
    2. This grabs enough space to include any larger numbers that might occur in later records.
    3. Right-click in the Data Panel and select Define Data Field > New Data Field.
    4. The default Field Name is Problem_No_1. Problem_No is a descriptive name, but there is only one field on this line so the "_1" is unnecessary.
    5. Click in the Field Name box and backspace twice to delete the number and underscore.
    6. Notice the Field Definition defaults to Fixed Column in both the Start Rule and End Rule tabs, starting in column 25 and ending in column 30.
    7. Click Add.
  10. Repeat step 6 for each remaining line of text on page 1 in TUTOR1.REP containing tagged data. See Table 3-2 below.
  11. Proceed to Browse Data Record in order to see how your data has changed.
  12. Rearrange Data Fields as needed.
  13. Save and Close your script.
Table 3-2: Tutorial 1 - Data Field Start and End Rules
Data Field Starting Column Ending Column
Report_Date 11 19
Problem_No 25 30
Techie 25 52
Status 25 52
MMDDYY 25 32
Time 25 32
Serial_No 25 39
Version 25 52
Customer_Name 25 52
Company_Name 25 52
Phone_No 25 52
Source_Type 25 52
Target_Type 25 52
Category 25 52