DataTools HowTo: Manually Parse a Binary Data File
Mon, 15 Jun 2009, 4:45pm
Question:
I have a source file that contains packed and binary data. How can I parse and define the structure of the file and export the unpacked data to a CSV text file format?
Answer:
Using the Data Parser for Binary, you can define the structure of the data records by using the visual parser window, and once the data is parsed and unpacked, export the data to a CSV text file.
Objective
There are two objectives in this DataTools HowTo, as follows:
- Use the Data Parser for Binary to define the schema/structure of a fixed-length data file
- Export the unpacked data to a CSV text file format
Skill Level
- Basic to Intermediate
Skill Set
- Data Parser
- Built-in visual parsing interface
- Basic understanding of binary and packed data storage formats
Design Considerations
The principal design consideration is to follow the procedural steps below in the order in which they are specified in this HowTo. A secondary consideration is to follow the same basic procedural steps, and in the same order, when you begin to define the structure/schema of your own data file.
The DataTools product used in this DataTools HowTo is the Data Parser for Binary.
The sample binary data file used in this HowTo is a simple fixed-length data file that contains binary and packed data. The sample data file name is TUTOR4.BIN and it was copied to this folder when DataTools was installed on your workstation, assuming the product was installed in the default location:
C:\Program Files\Pervasive\DataTools9\Common
Procedure
This tutorial is divided into the following tasks, which you should complete in the order shown:
- Set Starting Offset and Record Length
- Define Field Length, Data Type, and Set Data Field Properties
- Use the Export Functionality
Set Starting Offset and Record Length
In this section you will define the length of the file header and the length of the data records.
- When the Data Parser is launched you will see the Source Connection window.
- At the top of the window, click on the Source Connection box arrow and select Binary as the Connector type on the Factory Connections tab.
- Back in the Source Connection window, click the Source File/URI arrow and select the TUTOR4.BIN file from its installed location (see above).
- Click Open.
- Find StartOffset in Connector Properties grid on the Source Connection window, enter 35 and click Apply.
- Click OK.
- The data records are 67 bytes in length, so set the known record length in the Length box by typing 67 and press ENTER.
The sample file has a small header of 35 bytes of data; it is not part of the records and must be ignored.
You should now see the first character/byte of the first data record in the Data Parser window. The visual parser defaults the record length to 100 bytes, so the records are not lined up yet.
The data records should now display in a uniform manner although there are no data field definitions yet. Go to the next section.
Define Field Length, Data Type, and Set Data Field Properties
In this section you will define the length, data type, and field properties for each field in the data records.
It is important to define the fields from left to right. If you define a field size incorrectly, it throws off the definitions for every field thereafter.
- Click the last character of the first field in the top row of data.
- Beneath the data parsing area is a Field Name box. Highlight the default field name, Field1, and rename the field Company.
- Click the yellow area directly above the field. The Contents box shows the data for that field formatted according to the Length, Data Type and Properties you set up.
- For Field 2, count four positions on the ruler and click the mouse to mark the end of the second data field. A marker displays at the 20th position and the Property Size value is 4. Rename the field RecordNumber.
- Click the Data Type arrow and select Zoned Decimal from the list of data types.
- Enter the Property values for the second field. Click the Signed arrow and select No.
- You do not want to display figures after the decimal point for this field, so in the box labeled Decimals, highlight the default value and type 0 (zero).
- Press ENTER.
- Count four hash marks to the right of the last field's arrow and click your mouse on the ruler. Rename Field3 to DegreeOfVar.
- Select Type as 32-bit IEEE Floating-Point.
- Under Property, scroll down to Decimals and type 3.
- Use the information in the table below to continue defining the remainder of the fields, starting with Field 4.
- Do not try to mark the end of the last field, as this is also the end of the record. Instead, in the Field Name list, select Field9, rename it Serial Number, and make sure the Size is 9.
- Click the Incremental Save button when you are satisfied with the results of each field.
- Click OK to close the visual parsing window.
- Click Save Map.
- Type Tutor4 and click Save. Your schema work is saved.
This is the pale blue row beneath the yellow ruler. In this example, this is the final "s" in "Us" of Conversions R Us. A field marker (a double-ended arrow) displays at this position in the yellow area beneath the ruler.
In this case, it is easy to see where the first field ends because it is an ordinary text field and ends right where an incremental numeric data field starts.
See the record layout in the table below. The table provides the field type, size, and other data type properties needed to define each of the fields in the records.
Note: To delete a marker, click the ruler hash mark and the marker disappears. Then, create a new marker in the correct position.
Remember to NOT click the blue area unless/until you are ready to set the length of the second field.
The Offset box in the lower left displays the starting position of each field.
Note: The Contents box is the best visual cue when trying to set the data type and properties of a field correctly. If you know what the field should contain, you are better able to judge whether you have set the field length, data type, and data properties correctly. When you parse a file, you might need to go through the process of choosing each field's length, type, and properties by trial and error. You must click the hash mark in the ruler each time you want to show what it contains.
| Field No. | Name | Type | Size | Content of First Record | Other Properties |
|---|---|---|---|---|---|
| 1 | Company | Text | 16 | Conversions R Us | (None) |
| 2 | RecordNumber | Zoned Decimal | 4 | 1 | Precision: 4; Signed: No; Sign Pos: Trailing; Style MicroFocus COBOL; Scale: 0; Decimals: 0 |
| 3 | DegreeOfVar | 32-bit IEEE Floating-Point | 4 | 35.957 | Precision: 7; Signed: Yes; MSB First: No; Decimals: 3 |
| 4 | Bytes | Zoned Decimal | 9 | 3456680 | Precision: 9; Signed: No; Sign Pos: Trailing; Style: MicroFocus COBOL; Scale: 0; Decimals: 0 |
| 5 | ErrOccur | Boolean | 1 | True | Picture: True.False |
| 6 | Reversal | Display Sign Trailing | 10 | -99756 | Precision: 10; Signed: Yes; Sign Pos: Trailing; Style: MicroFocus COBOL; Scale: 0; Decimals: 0 |
| 7 | Cost | Packed Decimal | 4 | 20763.25 | Precision: 7; Signed: No; Sign Pos: Trailing Separate; Style: MicroFocus COBOL; Scale: 2; Decimals: 2 |
| 8 | FileNumber | Text | 10 | file 7668 | (None) |
| 9 | SerialNumber | Text | 9 | DB9785500 | (None) |
Caution: When you are finished parsing the data, you must save your changes, because they are not automatically saved as you go along. If you close the Data Parser without saving, you must start over. In the toolbar, click Incremental Save.
View Parsed Data
To view a representation of the unpacked source data, click on the Source Data Browser button in the button bar. If the schema is accurate you will see a visual representation of the unpacked data in a tabular (row and column) format.
If the data appears to be incorrect or inconsistent, go back to the beginning and locate the setting(s) and/or properties that need to be changed.
Use the Export Functionality
In this section you will export the unpacked data to a CSV text data file.
- In the main button bar, locate the Run Map button and click on it.
- On the Connect Info tab, click the Export File Type arrow.
- Select ASCII (Delimited) from the list.
- In the Target File/URI box, enter the folder path and desired filename for the CSV text file that will be created when you export the unpacked data from the Data Parser.
- On the Properties tab, find the Header value, and change the default from False to True.
- Click on the Run Export button below the Properties grid.
- Click the Save Map button in the button bar. Enter a document name and click Save.
View Your Target Data
If all worked as intended, you just created a target data file that contains the unpacked source data in a CSV data file format. To view the contents of the target data file without having to exit the Data Parser, click on the Target Data Browser button in the button bar. You can verify that the data is accurate and complete by also opening the Source Data Browser and placing the source and target browser windows side by side on your workstation's desktop.
Data Parser Troubleshooting Notes
- If the Data Parser arrows do not exactly mark where a field's data begins, this is not necessarily a problem. Look at the field's Size value and check the Contents box to make sure the data in that field is unpacked and is displaying correctly.
- If there are extra characters or if characters are truncated at the start of records, the Start Offset value is incorrect. Extra characters indicate that the StartOffset value needs to be increased. Truncation indicates that the StartOffset value needs to be decreased.
- If patterns in the data slant to the left or right, the record length is incorrect. If the patterns slant to the left, decrease the record length. If they slant to the right, increase the record length.


Your transaction secured by high-grade AES-256 encryption.