Data Extractor Tutorial 2 - Tagged Data and Automatic Features
Tutorial 2 guides you through the steps to create and save a script file using Data Extractor’s automatic processes. The source file for this tutorial is the same tagged-list used in Data Extractor Tutorial 1.
This tutorial introduces some of the useful timesaving features of Data Extractor that read and flatten a data file that contains tagged data. It is useful to anyone ready to learn about more advanced Data Extractor features. Tutorial 2 examines some quicker, more automatic ways to parse the same tagged-data used in Data Extractor Tutorial 1.
Things to remember when defining Data Fields and Line Styles in tagged data:
- When Data Extractor automatically creates Data Fields, it uses the positions you have highlighted to determine the length of the Data Field. Be sure and allocate enough space for data in subsequent records that are wider than the text you are currently selecting. For example, the Techie Name in the first record is "John". In a subsequent record it could be "Alexander Graham Bell IV".
- For tagged data, everything in the selection to the left of the Tag Separator is the Field Tag and everything to the right of the Tag Separator is the Data Field.
- When a Line Style is created, it is not just for the line you are working on but also for any line that matches the Line Style definition. This means that when you create a Line Style that looks for "Techie:" in columns 17 to 24, and there is a Data Field defined for that Line Style in columns 26 to 55, all lines that have "Techie:" in columns 17 to 24 have a Data Field in columns 26 to 55.
Tutorial Goals
In this tutorial, you will learn:
- How to create an extract script using automatic processes
- How to save the extract design as a script file
- New terms used throughout the Data Extractor documentation
Procedure
These steps should be completed in the order shown.
Define Data Fields
After selecting the tutorial file and setting up basic options, the first step in defining most extract scripts is to determine the line of data that marks the end of a record. In the TUTOR1 data file, the line of text that contains "Category:" marks the end of each record.
- Highlight the line that contains the string "Category:", up to column 45. Check the indicator in the lower right corner of the screen for column locations.
- Right-click anywhere in the Data Panel (the large white area of the screen) and select Define Data Field > Parse Tagged Data. Note: Data Extractor automatically defines a Line Style with the string "Category:" in columns 15 through 23 as the recognition pattern, and a Line Action of Collect Fields, and names it "Category". It also creates a Data Field that collects any data on that line beginning at column 24, one space after the colon, and going to column 45, and names it "Category". The field is now defined, and the text turns red on the screen.
- If you wish to check the Data Field definition, you can double-click on the field itself (the red text) and the Field Definition window opens. Make any necessary changes, then click Update.
- Proceed to Define the Line Style - Accept Record.
Define the Line Style - Accept Record
Since the Category Line Style is the last line of the record, the Line Action should be Accept Record. When Data Extractor creates a line style automatically, it makes the line style Collect Fields, so the line action needs to be changed.
- Double-click on the Line Style Name, "Category" in this case, in the Line Style Column, the yellow column on the left part of your screen. The Line Style Definition window appears.
- Click on the Line Action tab and select ACCEPT Record [Including] This Line’s Fields from the list of choices.
- Click Update.
- View the data record by clicking on the Browse Data Record button in the button bar.
- Proceed to Adjust Data Field Definition.
Adjust Data Field Definition
- Select the entire Problem No line by left clicking on that line in the Line Style Column (the left yellow column).
- Right-click in the Data Panel (the large white part on the right) and select Define Data Field > Parse Tagged Data. The Line Style pattern that Data Extractor automatically creates looks for Problem No: in positions 13 through 23.
- Double-click on Problem_No if you want to check it.
- Click Close to close the Line Style Definition window.
- To display the Field Definition window to view the information for the Problem_No: Data Field that was automatically generated, double-click anywhere in the Data Field where the text is red.
- Click the End Rule tab. Notice that the end rule is 52. This is larger than the Problem No: Data Field needs to be, because it is defining the size of the Data Field all the way to the right margin of the report.
- Change the end rule of the Problem_No: field to 30.
- Click Update. Notice the selected area on the Data panel for the Problem_No: Data Field is much smaller after the update.
- Proceed to Define the Header Information.
Define the Header Information
For this exercise, assume that the first line of the report contains information you want.
- Highlight the report name WINTECH on line 8 in positions 11 through 17.
- Right-click in the Data panel, and select Define Data Field > New Data Field. The Field Definition window appears.
- The default Data Field Name is highlighted. Since there is no tag on this line, Data Extractor used the data itself as the Line Style name and Data Field name. Change the field name to ReportName by typing it in the Field Name box.
- Click Add.
- To define the report date Data Field, repeat steps 1 through 4, except highlight from columns 11 to 19 and name the field ReportDate.
- Proceeed to Update Line Style.
Update Line Style
The purpose of this exercise is to update the automatically generated "Jul95" Line Style to make it more generic for different report dates.
- To edit the "Jul95" Line Style, double-click on Jul95 in the Line Style Column. The Line Style Definition window diplays. Notice that the Pattern for this Line Style looks for 13-Jul-95 in columns 11 to 19.
- Size the cells in the grid to view the information better, by following these steps:
- Position the mouse over the line in the header row of the grid where the column headings are. The mouse pointer becomes a bold vertical bar with arrows pointing to the left and right.
- Hold down the mouse button and drag the edge of the column to the left or right.
- Release the mouse button when the column is the desired size.
- If desired, adjust the height the same way using the gray border to the left where the triangle and asterisk are located.
- To change the pattern to look for a line with any date with the dd-mmm-yy format, click once in the Look For? cell on the first row of the grid where 13-Jul-95 is currently displayed. A down arrow appears on the right side of that cell.
- Click on that arrow and the Pattern Builder window appears.
- TAB to the Value cell, delete the original value, and type a dash (-).
- Change the values of both the Begin and End cells to 13 by tabbing to them and typing in the correct number.
- Click OK. Notice that the Look For?, Begin, and End values have changed in the Line Style Definition window to reflect the changes made in the Pattern Builder window.
- Add a new row to the Line Style Definition grid by clicking in the And/Or cell in the second row. Accept the value default of And.
- Click in the Search What? cell of the second row and click the down arrow.
- Select Column Range (m-n) from the displayed list.
- Select Contains from the list displayed in the Operator cell of the second row.
- Click on the arrow in the Look For? cell of the second row to display the Pattern Builder window again.
to the Value cell, delete the original value, and type in a dash (-). - Change the Begin and End values to 17. Be careful to only enter a dash in the Value cell and do not leave any spaces around it.
- Click OK. The line style definition should now match any line with a dash in position 13 and 17.
- Click Update to save the changes to the ReportDate Line Style.
- Proceed to Define Remaining Data Fields and Line Styles.
Define Remaining Data Fields and Line Styles
In this exercise, you will Define Data Fields and Line Styles for the Techie, Status, MM/DD/YY, Time, Ser #, Version, Customer Name, Company Name, Phone #, Source Type, and Target Type Tagged Data Fields.
- Highlight the Field Tag, the Tag Separator, and the data by dragging the mouse with the left mouse button depressed from the beginning of the Tag to the end of the Data Field. Remember to extend out to the right to catch wider data in subsequent records.
- Right-click in the Data Panel and select Define Data Field > Parse Tagged Data. Data Extractor creates a Line Style Definition and a Data Field Definition for you. OR
- Click the line in the Line Style column to select it.
- Select Parse Tagged Data.
- Open the Field Definition window.
- Adjust settings.
- Click Update and Close.
- Browse the data records to see how your data has changed.
- If desired, rearrange the data fields as needed to meet your export file requirements.
- Save and close your script.
Note:Data Extractor named the MM/DD/YY, Ser #, and Phone # Data Fields and corresponding Line Styles MMDDYY, Ser, and Phone. Also, Data Fields with embedded spaces are named with the spaces removed. This was done because Field Names can only contain letters, digits and underscores. Scroll down in the Data panel and see how the rest of the data is being defined.
Tip: This file can be parsed even more automatically. If you wish to try it, follow these steps:
- Click the Clear Line Styles icon in the button bar.
- Highlight all the tagged data lines in the entire first record, beginning with the Problem No line and highlighting all the way down and including the Category line. Be sure to catch all the field tags and data plus some extra space to the right.
- Right-click in the data panel and select Define Data Field > Parse Tagged Data. The Data Extractor creates several new line styles and data fields at once. This method only works in cases of highly structured and consistent data. And it can be a great time saver when conditions are ideal.
Your transaction secured by high-grade AES-256 encryption.