Most people will want to use the online forms available from the PURL Validation page to validate a few PURLs once in a while and immediately view the results. Those people do not need to read the rest of this document. The online forms should contain sufficient information to validate PURLs and view the results.
The rest of this document provides an overview of PURL validation steps for PURL administrators so they can control the validation process at their site, run batch validations, and possibly create their own validation processes.
Note to administrators: The directory WebRoot/docs/maint/validate under your PURL installation's top-level directory contains code and examples demonstrating the use of PURL metadata to validate PURLs. References to code and forms files below exist in this directory unless explicitly specified otherwise.
Every validation begins with identifying a list of PURLs to validate. The PURL Validation page provides three methods of identifying a list of PURLs to validate and each method has a corresponding input form:
The input from the forms for methods 1 and 2 is sent to the Perl script select.pl.cgi for processing. select.pl.cgi finds the list of PURLs that match the form input and presents that list to the user for further editing in a subsequent form. select.pl.cgi is also used to accomplish method 3 by outputting an empty list into which the user can enter the PURLs directly.
Each attempted PURL validation results in the creation of a "validation record". Validation records contain the following fields.
| PURL: | The PURL that was validated. |
| TIME: | The time validation occurred. |
| HOPS: | The number of redirections encountered during validation. If the PURL has an associated URL that does not have a subsequent redirect, this value will be one. Otherwise, the total number of redirects encountered will be listed. If this value is zero, the PURL resolver served the document directly without returning a redirect to the validation processes. |
| STATUS: | The HTTP status returned for the final non-redirect response received from the HTTP server responsible for the URL-identified item. Some problems can occur during validation that do not have standard HTTP response numbers. Note that 200 is the standard successful HTTP response. However, it is up to you to decide which codes are "valid" and which codes require some further action. See status-codes.html for an explanation of these codes. |
| URL: | The ultimate URL checked. If the validation process encountered redirects, this field will contain the last URL checked. Note that in the case of redirects, this field will not be the same as the URL directly associated with the PURL in the PURL system. |
While the PURL validation records could naturally be placed in a relational database management system (RDBMS), we did not want to include an RDBMS in the PURL release. Therefore, we have made the information available in a format that others can easily load into their own RDBMS, manipulate, and generate their own reports, etc.
Since we have decided to not ship an RDBMS to maintain the validation information, the validation records are simply dumped in a flat file format log with the fields delimited by whitespace. This information can easily be loaded into your own RDBMS (or whatever) for more extensive processing. As an example, the PURL Resolver at OCLC uses PostgreSQL and the RDBMS scripts distributed in the general release (but not hooked up by default) to do its validation. This way, we can provide scripts that work both against the flat file version (the default) and give real, working code as examples of how to process the validation information out of a relational database.
As with validation, the first step to reporting validation information is to identify a list of PURLs for which validation information is required. Identifying a list of PURLs for viewing results uses forms nearly identical to those listed above in Section II. [Just replace "check" with "view" in the filename (e.g., search_view.html instead of search_check.html). The main difference being that titles, button texts, and hints have been slightly modified to reflect that the correspondingly created list of PURLs will not be validated, but used to look up validation results. The processing of the forms is still done using select.pl.cgi.]
Once a list of PURLs of interest has been identified, the validation records are used to generate reports which are then sent to the appropriate individuals, or presented immediately online.