URI Support
The Data Extractor offers dramatic source flexibility using URI support. The full addressing scheme must be present before the Data Extractor can connect to the source file.
Note: A minimum of v1.4.0 of Sun Java Runtime Environment (JRE) must be installed for the URI to function properly. Without this component, you will receive the error message: "Unable to load Java virtual machine."
For more information on URIs within the Data Extractor, see Dealing with URI Limitations below.
What Are URIs?
URIs (Uniform Resource Identifiers) are a modern subset of URLs, that were originally nothing more than HTTP and FTP locations on the Internet. A URI typically describes the following:
- The mechanism used to access the resource
- The specific computer in which the resource is housed
- The specific name of the resource (a file name) on the computer
For example, this URI...
http://www.yahoo.com
...identifies a file that can be accessed using the Web protocol application, Hypertext Transfer Protocol ("http://"), that is housed on a computer named "www.yahoo.com" (which can be mapped to a unique Internet address). In the computer's directory structure, the file is located at "/support.asp".
File Transfer Protocol addresses and e-mail addresses are also URIs (and are also called a URL).
The common syntax for scheme-specific data is:
file://
Some or all of the parts "
| Component | Description |
|---|---|
| user | An optional user name. Some schemes (e.g., ftp) allow the specification of a user name. |
| password | An optional password. If present, it follows the user name separated from it by a colon. The user name and password, if present, are followed by a commercial "at" sign ( @ ). Within the user and password field, any colon ( : ), "at" sign ( @ ), or slash ( / ) must be encoded. |
| host | The fully qualified domain name of a network host, or its IP address as a set of four decimal digit groups separated by periods ( . ). |
| port | The port number to connect. Most schemes designate protocols that have a default port number. Another port number may be supplied, in decimal, separated from the host by a colon. If the port is omitted, the colon is as well. |
| url-path | The remainder of the locator consists of data specific to the scheme, and is known as the "url-path". It supplies the details of how the specified resource can be accessed. Note: The slash ( / ) between the host (or port) and the url-path is not part of the url-path. |
Dealing with URI Limitations
URI support is currently limited to "public" WWW sites, where the navigation to the page containing web content you want to extract is directly addressable with one URL hyperlink (e.g., cnn.com). You cannot connect to interactive session resources like TELNET or RLOGIN.
While this URI support is a huge benefit, and reaches hundreds of millions of pages, there are still many URLs that are not directly addressable. You must first go through some painstaking navigation, authentication, looping, etc., before you arrive at the page containing the data to be extracted.
Solutions for overcoming this limitation at design time include:
- Manual Human navigation to the desired web page (entering the requisite passwords, etc.), and then doing a Save As to a local file. That file could then be brought in and processed using Data Extractor.
- Automated Human navigation to the desired web page (entering the requisite passwords, etc.), and then doing a SaveAs to a local file. That file could then be brought in and processed using Data Extractor.
Your transaction secured by high-grade AES-256 encryption.