Extract text from RTF binary field in MS Access?

Hi,

I have a legacy MS Access database. It has a table called "ContactNotes"
that stores contact notes as RTF binary in a BLOB field. It also has a lookup field with the UID of the corresponding contact from the contact table called "Contact". I want to either:

a) Get the contact and the RTF contact notes into Salesforce (if they have a field type that will display RTF), or
b) Get the contact and just the text contained in the RTF into any standard portable form (Excel table, Salesforce, or whatever) - like a table with first name, last name, notes

Do you know if Pervasive can do this? How about the new binary tool?

I am not a coder, don't speak SQL, but can point and click my way around.

Thanks, James.

RE: Extract text from RTF binary field in MS Access?

Hi James,

Your data project includes two common challenges, as follows:

  • The need to retain the parent/child relationship between data records from two tables in the source database via a key field.
  • The need to convert marked-up data from the source to the target.

I will address the marked-up data challenge first.

Embedded data in RTF, HTML, and XML format are a fairly common occurrence inside database tables, particularly for "note" fields, and can present a challenge when trying to migrate or convert data to another database or application.

The first consideration is whether or not the TARGET database or application supports the storage of marked-up text. That is, does it offer a data type that supports mark-up languages in a data field? To answer this question, you should consult the documentation for the TARGET database or application. This type of information is usually found in the "developer" or "administrator" documentation, rather than the "end user" documentation.

If the TARGET application DOES offer a data type that supports mark-up languages, be sure that the target data field to which you wish to map the marked-up data is configured properly for the type/format of the data you will be inserting. If this is the case, the Data UpLoaders and Data DownLoaders will convert the contents of your source data "as is"; i.e., the mark-up code will be retained when the data is written to the target database or application.

If the TARGET application does NOT offer a data type that supports mark-up languages, you will need to convert the data into the target as "plain text". See the Summary below for a solution.

Here are some options for how to handle the parent/child relationship issue.

Create a view in the source database that includes all the data records from the "parent table" AND the "child table", and save the view. In the Data UpLoader or Data DownLoader, connect to the view on the Source Connection tab. Your source schema will include all the data fields you need to map to the target. And the source view will also include the key field in each data record, thus making it easy to map that key to both the parent records and the child records in your target database or application. This option requires that you have the originating database or application installed and working.

Create a view in the source database (same as above) and then create a report that includes all the data fields in the view. Print the report to a text file (rather than to a printer). Use the Data Extractor for Unstructured Text to extract NON-marked-up (plain text) data, with relationships retained, out of the report file, and export the data records to a CSV text file. The CSV text file can then be mapped into the target database using one of the Data Loaders that supports your target. This option requires that you have the originating database or application installed and working.

Write a SQL statement on the Source Connection tab in the Data UpLoader or Data DownLoader to join the records from the "parent table" and the "child table" in your source database. This is similar to creating the view (see above), but it is temporary and used only when you run the conversion. This option does not require that you have the originating database or application.

Use the DataTools Data Joiner to join the two source data tables in a user-friendly (no SQL knowledge needed) interface and export the data to a CSV text file. The CSV text file can then be mapped into the target database using one of the Data Loaders that supports your target. This option does not require that you have the originating database or application.

Summary

Here are the two best options if the target database or application DOES support mark-up languages:

If you have the source database/application, create a view in the source database, connect to that view, and map the data directly to the target database using one of the Data Loaders.

If you do not have the source database/application, use the Data Joiner to join the source tables and export the data to CSV text. Then map the data directly to the target database using one of the Data Loaders.

Here is the best option if the target database or application does NOT support mark-up languages:

If you have the source database/application, create a view and a report in the source database that includes all the data fields in the view. Print the report to a text file (rather than to a printer). Use the Data Extractor for Unstructured Text to extract NON-marked-up (plain text) data, with relationships retained, out of the report file, and export the data records to a CSV text file. The CSV text file can then be mapped into the target database using one of the Data Loaders that supports your target. Details about how to print a report to a file can be found in the online documentation for the Data Extractor.

Please let us know if you have any questions.

- Jo

RE: Extract text from RTF binary field in MS Access?

Hi Jo,

Thank you for such a comprehensive reply.

The desktop windows application that drives the database (MS Access or Sybase ASA 8 - I have copies of database in both forms) doesn't make the notes field available in it's File>Export wizard, but fortunately it does in it's File>Print wizard, so I experimented with that.

First I Googled "Print to Text" and learned that by going to Start>Settings>Printer>Add Printer I could set up a printer called "Generic/Text", which I did. Then I tested printing to this including the notes field. The other fields came out fine, but unfortunately the notes field came as text randomly broken up by a contol character (a square) followed by a random number of spaces.

Next I thought of printing to PDF instead and then saving the PDF to text. This resulted in the correct notes text and depending on the capabilities of the Unstructured Processor, it appears if I included the field names, that a processor could probably read and parse out this text file. Indeed, a processing tool may even get better results processing the PDF directly, since the PDF could provide the processor visual placement cues as well as field names.

I was about to pursue this further when it occured to me that your solution that doesn't require the source application - using the Data Joiner - would actually have a big advantage: it would allow me to get out the UID field - which is not available in the File>Print or File>Export wizards in the application. Having the UID would allow me to export other data - for example, activities and email. When these are printed or exported, they only give you the name of the contact they relate to. Linking these to the correct contact based on a name look up would not be as reliable as a UID, so I am now going to experiment using the Data Joiner as prescribed.

So I downloaded the test version of the application from the website (PVDataTools-8.17.0.2.10-setup.exe) so that I can test this. I installed it on my Windows 2K machine and it reported complete with no error messages. However when I ran the application, it gave me this error message: "Component 'Vsflex7.ocx' or one of its dependencies not correctly registered: a file is missing or invalid". Restarted machine. Launches fine now. My ASA 8 server that used to launch automatically doesn't seem to do so anymore for some reason, but that's not a big deal.

So my next questions are:

1. Will the Data Joiner be able to read the text within the binary RTF?
2. In your summary you give recommendations for 3 of the 4 possible permutations:
a. Target DOES support mark-up languages + Have source application
b. Target DOES support mark-up languages + Do NOT have source application
a. Target does NOT support mark-up languages + Have source application

For the last permutation (Target does NOT support mark-up languages + Do NOT have source application), what is your recommendation? If the anwer to 2. is Yes, then I assume it is to use the Data Joiner.

UPDATE!

I have answered my own question to #1 above. Yes! It does read the RTF! At least, it decodes the binary and show the mark-up version. Here's an example:

Date: Thu, 27 Jan 2005 11:45:53 -0700
MIME-Version: 1.0
X-Mailer: CustomApp 3.00
Content-Type: text/x-strtf
Content-Transfer-Encoding: 7bit

{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 Arial;}}
{\colortbl ;\red0\green0\blue0;}
\viewkind4\uc1\pard\cf1\fs18 He's waiting to hear from Bob on Press Release date\par
\par
Says that demo I give will provide most impact to ABC if it happens at the time of the press release. Target date end of Feb. He wil book - or Joe will.\par
\par
Gave me Michelle's cell number.\par
\par
I asked who works on ACME and Amco. He asked me what was going on. He said he works on both. He said he will email them an introduction to me and copy me, with contact information.\par
\par
He said that they already have an internal note.\par
\par
\par
}

This is great progress. I wonder if I could use a formula in excel that would convert the RFT markup to text? Or perhaps run through MS Word somehow..

Thanks,
James

RE: Extract text from RTF binary field in MS Access?

Hi Rabbit,

Yes, the first best option is to try the Data Joiner if you do not have the original database and the target does not support mark-up languages.

The Vsflex7.ocx error is usually the result of an incomplete installation. Be sure you are logged onto your workstation with "admin" privileges. Follow the instructions in the Solving Installation Problems knowledge base article to cleanly un-install and re-install DataTools while logged in with full administration privileges.

Please let us know if this solves the vsflex7.ocx problem, or not.

- Jo

RE: Extract text from RTF binary field in MS Access?

Thanks Jo.

vsflex problem vanished after reboot. Just did a good test with DJ - see results above. Have the RTF visible in export file as mark up. Now looking for a way to clean it of tags and get the plain text. I'll probably google it -- maybe there's some kind of RTF to Plain text processor out there?

Or I suppose I could just store the RTF in Salesforce and then figure out a way to display it. Maybe the layout field that reads that data field could be set to open it in a browser pop-up window and use an MS Word plug-in to decode it?

Thanks,
Rabbit

RE: Extract text from RTF binary field in MS Access?

Hi Rabbit,

Great! Glad you got the Data Joiner to work...and with your source data!

Now, depending on how simple or complex the RTF formatting is, you can possibly use the RIFL expression language in the Data UpLoader to MS Access to strip out the RTF coding. Or another possibility is to use the Data Extractor. There is no detailed documentation for handling RTF with the Data Extractor, but let us know if you want to try this option, and we will try to assist as best as possible.

- Jo

RE: Extract text from RTF binary field in MS Access?

Hi Jo,

Yes! I would love to know how to use both of those methods (RIFL/Data Uploader and the Data Extractor). Should I play around with them first, and then come back with questions, or would you like to give me some pointers?

Rabbit

RE: Extract text from RTF binary field in MS Access?

Hi Rabbit,

Of course, we always recommend that you give it a try before asking questions! :)

The documentation in DataTools is quite extensive. After launching the Data UpLoader to MS Access, click on the Help icon (it's a question mark) at the far right end of the button bar. When the Help window opens, click on the Show icon to open the Contents, Index, and Search tabs. Click on the Contents tab and type "RIFL" and expand the tree. You will see an entry labeled "Rapid Integration Flow Language". Expand it. Now locate the entry labeled "The RIFL Script Editor". Expand it. Locate the entry labeled "RIFL Script Editor" and double-click on it. This help topic is a good place to start and gives you an overview of how the RIFL Script Editor works. The RIFL Script Editor is great for those of us who are not "programmers" but need to use RIFL to clean, parse, concatenate, or otherwise manipulate the data before it gets written to the target.

Since you need to strip out RTF code, you will want to use one or more of the RIFL Text Functions. In the Help Contents list, locate the topic labeled "Functions" and click on it. This is a list of all the functions that are included in the RIFL Script Editor. Each is a hyperlink that opens the detailed help topic for that function. Scroll down in the list and check out the following functions that you are likely to need:

  • Chr, Chr$ Functions
  • Clean Function
  • GSub Function

Give that a try and let us know if you run into a snag. We can discuss Data Extractor options, if needed.

- Jo

RE: Extract text from RTF binary field in MS Access?

Hi Jo,

Thanks for the pointers. I read about the RIFL functions and it looks like they will work. Very flexible and powerful by the looks of it. But before I move onto that I want to do the following:

After having successfully connected to a 1997 MS Access database using Data Joiner, I'd thought I'd be ambitious and try connecting to Sybase, which it so happens has a more recent version of my data.

If I launch the application "Sybase Central" is tells me I have Adaptive Server Anywhere 8 running. Right click on My Computer > Manage > Services shows it running with the executable path: C:\Program Files\ASA 8\ASA\Win32\dbsrv8.exe -hvASANYs_mydata

"mydata" is the name of my database and has a .db extension and is located: C:\Database\mydata.db

In the connection settings, I tried many combinations of the following:
1. Factory connections: Anything with Sybase in the name (since nothing was listed explicitly with v8)
2. Server: the executable path with and without the "-hvASANYs_mydata" modifier

I can't promise I tried every permutation. For Source Database, I always used C:\Database\mydata.db, and I always entered the User ID and Password. Hitting the Connect button never resulted in the tables showing up in the Source Table/View drop down.

Questions:

1. Would it help if I tried copying a "Connection String" from here?:
http://www.connectionstrings.com/?carrier=sybase
- Or is the Data Joiner wizard actually building it's own connection string?
2. What is a connection string anyway, and when do you need one?

In case it has any diagnostic value, when I right click on the sybase icon in the running task bar and restore it, it show the text below in a log window, that also has a "Shutdown" button.

Also, on a more minor note, I couldn't find a way to tell Data Joiner to include field names as the first record in the export file. Is there a way?

Thanks,
Rabbit

Sybase Adaptive Server Anywhere Network Server Version 8.0.2.4251Copyright © 1989-2002 Sybase, Inc.
Portions Copyright © 2002, iAnywhere Solutions, Inc.
All rights reserved. All unpublished rights reserved.

This software contains confidential and trade secret information of
iAnywhere Solutions, Inc.
Use, duplication or disclosure of the software and documentation
by the U.S. Government is subject to restrictions set forth in a license
agreement between the Government and iAnywhere Solutions, Inc. or
other written agreement specifying the Government's rights to use the
software and any applicable FAR provisions, for example, FAR 52.227-19.

Sybase, Inc., One Sybase Drive, Dublin, CA 94568, USA
Concurrent Seat model. Access to the server is limited to 300 concurrent seat(s).
This server is licensed to:
CustomApp Administrator
CustomApp Workgroup
196496K of memory used for caching
Minimum cache size: 2048K, maximum cache size: 262144K
Using a maximum page size of 8192 bytes
Starting database "mydata" (C:\Database\mydata.db) at Tue May 27 2008 19:46
Database file "C:\Database\mydata.db" consists of 4449 disk fragments
Transaction log: mydata.log
Starting checkpoint of "mydata" (mydata.db) at Tue May 27 2008 19:46
Finished checkpoint of "mydata" (mydata.db) at Tue May 27 2008 19:46
Database "mydata" (mydata.db) started at Tue May 27 2008 19:46
Database server started at Tue May 27 2008 19:46
Trying to start SharedMemory link ... SharedMemory link started successfullyTrying to start TCPIP link ...Starting on port 3562
TCPIP link started successfullyNow accepting requests
Cache size adjusted to 245736K
Starting checkpoint of "mydata" (mydata.db) at Fri Feb 01 2008 20:16
Finished checkpoint of "mydata" (mydata.db) at Fri Feb 01 2008 20:16

RE: Extract text from RTF binary field in MS Access?

Hi Rabbit,

Thank you for providing detailed information in your posts! It really helps us help you better and faster!

Assuming DataTools can read the Sybase v8 database, although there is no specific connector for that version, here are some things to try:

The path to where Sybase is installed needs to be in your Environment Variables PATH statement in order for DataTools to "see" it. If it is not, please add it.

On the DataTools Source Connection tab, try these entries in the boxes. DataTools will build the connection string using what you enter.

  1. Click the Source Connection down arrow and select Sybase Adaptive Server 11.x from the list.
  2. In Server Name, type in the name of your server. NOTE: If Sybase is installed locally, you can leave this box blank.
  3. In Source Database, type in the database name (mydata).
  4. In User ID, type in your username for the selected database.
  5. In Password, type in your password for the selected database.
  6. In Table/View (default), click the Down Arrow and choose the Table or View from the list.

If this does not work, you may want to also try selecting Sybase SQL Anywhere 6 and repeat the steps.

If neither of those works, you may need to connect to your Sybase database via an ODBC connection. And, this requires that you have an ODBC driver for Sybase installed and configured on your workstation.

Information about how to include source field names as the first (header) record in the exported CSV data file can be found in the online documentation for the Data Joiner in a help topic entitled Exporting Joined Data.

Please let us know the results of these suggestions.

- Jo

RE: Extract text from RTF binary field in MS Access?

Jo,

Thanks, I'll try this later. But first, when you say:
"The path to where Sybase is installed needs to be in your Environment Variables PATH statement in order for DataTools to "see" it. If it is not, please add it."
- do you mean that I should do this in addition to the steps you listed?
If so, how?

Rabbit.

Hi Rabbit, Yes, if the path

Hi Rabbit,

Yes, if the path to the Sybase database is not in your PATH statement, DataTools will not be able to detect the database engine when you try to connect. So, it is necessary to do the following steps first.

To view, edit, or add an application to the PATH statement in your workstation's Environment Variables, follow these steps:

CAUTION: Changing the values in the PATH statement on your workstation may disable the functionality of programs. If you are unfamiliar with the steps below, we strongly recommend that you contact your IT Administrator or a qualified computer professional.

  1. Right-click on the My Computer icon to open the System Properties dialog box.
  2. Click on the Advanced tab.
  3. In the Startup and Recovery section, locate the Environment Variables button and click on it.
  4. In the System Variables section, scroll down until you can see the Path variable and click to highlight it.
  5. Click on the Edit button below the System Variables box.
  6. Press the HOME button on the keyboard to move the cursor to the left end of the Variable Value string. Then use the Right Arrow on the keyboard to scroll through the values that are already entered.
  7. If the value that needs to be entered for the application is already entered, verify that it is typed correctly and matches the drive and folder path to the correct folder or sub-folder. This information is available in the documentation for your specific application or database.
  8. If the value that needs to be entered is either not there or is incorrect, CAREFULLY enter or edit the value per the application or database documentation.
  9. When finished, click the OK button to close the Edit System Variable dialog box.
  10. Click OK to close the Environment Variables dialog box.
  11. Click OK to close the System Properties dialog box.

Please let us know if this solves the connectivity issue, or not.

- Jo

Hi Jo, The variable value

Hi Jo,

The variable value for Path reads:
%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program Files\QuickTime\QTSystem\;C:\Program Files\Smart Projects\SmartProject

It looks like a list of alternative places it can look. Can I just add another semi-colon plus the path to the Sybase executable? So that I leave the others in place in case they are important..Like this:
%SystemRoot%\system32;%SystemRoot%;%SystemRoot%\System32\Wbem;C:\Program Files\QuickTime\QTSystem\;C:\Program Files\Smart Projects\SmartProject;C:\Program Files\ASA 8\ASA\Win32\dbsrv8.exe

I don't know for sure if dbsrv8.exe is the executable. There are a bunch of .exe files in the \ASA folder and in the \ASA\Win32 folder. Does this sound right to you? Maybe I should only add on the folder path: C:\Program Files\ASA 8\ASA\Win32\ ?

-Rabbit

RE: Hi Jo, The variable value

Hi Rabbit,

Yes, add a semi-colon at the "right end" of the existing PATH statement and then add the drive and folder path for Sybase.

I don't have Sybase installed on my workstation, so I'm not sure of the Sybase PATH string. That information should be included in the Sybase documentation. Sybase has a comprehensive set of product manuals called Sybooks Online at this URL: http://sybooks.sybase.com/nav/base.do

- Jo

RE: Extract text from RTF binary field in MS Access?

Hi Jo,

Thanks for the pointers. Give me a few days. I'll experiment and report back.

Rabbit.