PAV - Provenance, Authoring and Versioning

IRI:
http://purl.org/pav/
Version IRI:
http://purl.org/pav/2.1
Current version:
2.1.0
Previous version:
http://purl.org/pav/2.0/ (visualise it with LODE)
Backward compatible with:
http://purl.org/pav/2.0/ (visualise it with LODE)
Backward compatible with:
http://purl.org/pav/authoring/2.0/ (visualise it with LODE)
Backward compatible with:
http://purl.org/pav/provenance/2.0/ (visualise it with LODE)
Backward compatible with:
http://purl.org/pav/versioning/2.0/ (visualise it with LODE)
Incompatible with:
http://swan.mindinformatics.org/ontologies/1.2/pav.owl (visualise it with LODE)
Authors:
Paolo Ciccarese
http://www.hcklab.org/foaf.rdf#me
http://www.paolociccarese.info/
Contributors:
Marco Ocana
Stian Soiland-Reyes
http://soiland-reyes.com/stian/#me
Publisher:
http://swan.mindinformatics.org/
Imported Ontologies:
http://pav-ontology.googlecode.com/svn/trunk/1.2/pav.owl (visualise it with LODE)
http://www.w3.org/ns/prov# (visualise it with LODE)
Other visualisation:
Ontology source
Machester Ontology Browser

Abstract

PAV is a lightweight ontology for tracking Provenance, Authoring and Versioning. PAV specializes the W3C provenance ontology PROV-O in order to describe authorship, curation and digital creation of online resources.

This ontology describes the defined PAV properties and their usage. Note that PAV does not define any explicit classes or domain/ranges, as every property is meant to be used directly on the described online resource.

Table of Content

  1. Introduction
  2. Object Properties
  3. Data Properties
  4. Namespace Declarations

Introduction

PAV supplies terms for distinguishing between the different roles of the agents contributing content in current web based systems: contributors, authors, curators and digital artifact creators. The ontology also provides terms for tracking provenance of digital entities that are published on the web and then accessed, transformed and consumed. In order to support broader interoperability, PAV specializes the general purpose W3C PROV provenance model (PROV-O).

PAV distinguishes between the data related to the digital artifact - named Provenance - and those related to the actual knowledge creation and therefore to the intellectual property aspects – named Authoring. The Versioning axis describes the evolution of digital entities in time.

Using PAV, descriptions can define the Authors that originate or gave existence to the work that is expressed in the digital resource (pav:authoredBy); curators (pav:curatedBy) who are content specialists responsible for shaping the expression in an appropriate format, and contributors (super-property pav:contributedBy) that provided some help in conceiving the resource or in the expressed knowledge creation/extraction.

These provenance aspects can be detailed with dates using pav:curatedOn, pav:authoredOn, etc. Further details about the creation activities, such as different authors contributing specific parts of the resource at different dates are out of scope for PAV and should be defined using vocabularies like PROV-O and additional intermediate entities to describe the different states.

For resources based on other resources, PAV allows specification of direct retrieval (pav:retrievedFrom), import through transformations (pav:importedFrom) and sources that were merely consulted (pav:sourceAccessedAt). These aspects can also define the agents responsible using pav:retrievedBy, pav:importedBy and pav:sourceAccessedBy. Version information can be specified using pav:previousVersion and pav:version.

The creation of the digital representation, for instance an RDF graph, can in many cases be different from the authorship of the knowledge, and in PAV this digital creation is specified using pav:createdBy, pav:createdWith and pav:createdOn.

PAV 2.1 updates PAV 2.0 with PROV-O specializations and more detailed descriptions of the defined terms. Note that PROV-O is not imported directly by this ontology as PAV can be used independent of PROV. PAV 2 is based on PAV 1.2 but in a new namespace ( http://purl.org/pav/ ). Terms compatible with 1.2 are indicated in this ontology using owl:equivalentProperty.

The ontology IRI http://purl.org/pav/ always resolve to the latest version of PAV 2. Particular versionIRIs such as http://purl.org/pav/2.1 can be used by clients to force imports of a particular version - note however that all terms are defined directly in the http://purl.org/pav/ namespace.

The goal of PAV is to provide a lightweight, straight forward way to give the essential information about authorship, provenance and versioning, and therefore these properties are described directly on the published resource. As such, PAV does not define any classes or restrict domain/ranges, as all properties are applicable to any online resource.

Object Properties

Authored byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/authoredBy

An agent that originated or gave existence to the work that is expressed by the digital resource.

The author of the content of a resource may be different from the creator of the resource representation (although they are often the same). See pav:createdBy for a discussion.

The date of authoring can be expressed using pav:authoredOn - note however in the case of multiple authors that there is no relationship in PAV identifying which agent contributed when or what. If capturing such lineage is desired, it should be additionally expressed using activity-centric provenance vocabularies, for instance with prov:wasGeneratedBy and prov:qualifiedAssocation.

has equivalent properties
  • pav1:authoredBy
has super-properties

Contributed byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/contributedBy

The resource was contributed to by the given agent.

The agent provided any sort of help in conceiving the work that is expressed by the digital artifact. Superproperty of pav:authoredBy and pav:curatedBy.

Note that as pav:contributedBy identifies only agents that contributed to the work, knowledge or intellectual property, and not agents that made the digital artifact or representation (pav:createdBy), thus this property can be considered more precise than dct:contributor. See pav:createdBy for a discussion.

The date of contribution can be expressed using pav:contributedOn - note however in the case of multiple contributors that there is no relationship in PAV identifying which agent contributed when or what. If capturing such lineage is desired, it should be additionally expressed using activity-centric provenance vocabularies, for instance with prov:wasGeneratedBy and prov:qualifiedAssocation.

has equivalent properties
  • pav1:contributedBy
has super-properties
  • prov:wasAttributedTo
has sub-properties
Authored byop, Curated byop

Created atop back to ToC or Object Property ToC

IRI: http://purl.org/pav/createdAt

The geo-location of the agents when creating the resource (pav:createdBy). For instance a photographer takes a picture of the Eiffel Tower while standing in front of it.

Created byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/createdBy

An agent primary responsible for making the digital artifact or resource representation.

This property is distinct from pav:authoredBy, which identifies who authored the knowledge expressed by this resource; and pav:curatedBy, which identifies who curated the knowledge into its current form.

pav:createdBy is more specific than dct:createdBy - which might or might not be interpreted to cover this creator.

For instance, the author wrote 'this species has bigger wings than normal' in his log book. The curator, going through the log book and identifying important knowledge, formalizes this as 'locus perculus has wingspan > 0.5m'. The creator enters this knowledge as a digital resource in the knowledge system, thus creating the digital artifact (say as JSON, RDF, XML or HTML).

A different example is a news article. pav:authoredBy indicates the journalist who wrote the article. pav:contributedBy can indicate the artist who added an illustration. pav:curatedBy can indicate the editor who made the article conform to the news paper's style. pav:createdBy can indicate who put the article on the web site.

The software tool used by the creator to make the digital resource (say Protege, Wordpress or OpenOffice) can be indicated with pav:createdWith.

The date the digital resource was created can be indicated with pav:createdOn.

The location the agent was at when creating the digital resource can be made using pav:createdAt.

has equivalent properties
  • pav1:createdBy
has super-properties
  • prov:wasAttributedTo

Created withop back to ToC or Object Property ToC

IRI: http://purl.org/pav/createdWith

The software/tool used by the creator when making the digital resource.

For instance: Protege, Wordpress, LibreOffice.

has super-properties
  • prov:wasAttributedTo

Curated byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/curatedBy

An agent specialist responsible for shaping the expression in an appropriate format. Often the primary agent responsible for ensuring the quality of the representation.

The curator is different from the creator of the digital resource (although they are often the same), see pav:createdBy for a discussion.

The date of curating can be expressed using pav:curatedOn - note however in the case of multiple curators that there is no relationship in PAV identifying which agent contributed when or what. If capturing such lineage is desired, it should be additionally expressed using activity-centric provenance vocabularies, for instance with prov:wasGeneratedBy and prov:qualifiedAssocation.

has equivalent properties
  • pav1:curatedBy
has super-properties
is inverse of

Curatesop back to ToC or Object Property ToC

IRI: http://purl.org/pav/curates

Provided for backwards compatibility with PAV 1.2 only. Use instead the inverse pav:curatedBy.

is inverse of

Derived fromop back to ToC or Object Property ToC

IRI: http://purl.org/pav/derivedFrom

Derived from a different resource. Derivation conserns itself with derived knowledge. If this resource has the same content as the other resource, but has simply been transcribed to fit a different model (like XML -> RDF or SQL -> CVS), use pav:importedFrom. If the content has been further refined or modified, use pav:derivedFrom.

has super-properties
  • prov:wasDerivedFrom

Imported byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/importedBy

An entity responsible for importing the data.

The importer is usually a software entity which has done the transcription from the original source.

See pav:importedFrom for a discussion of import vs. retrieve vs. derived.

has equivalent properties
  • pav1:importedBy
has super-properties
  • prov:wasAttributedTo

Imported fromop back to ToC or Object Property ToC

IRI: http://purl.org/pav/importedFrom

The original source of imported information.

Import means that the content has been preserved, but transcribed somehow, for instance to fit a different representation model. Examples of import are when the original was JSON and the current resource is RDF, or where the original was an document scan, and this resource is the plain text found through OCR.

The imported resource should somehow convey the same knowledge/content as the original source. If additional knowledge has been contributed, pav:derivedFrom would be more appropriate.

If the resource has been copied verbatim from the original representation (e.g. downloaded), use pav:retrievedFrom.

To indicate which agent(s) performed the import, use pav:importedBy. Use pav:importedOn to indicate when it happened.

has equivalent properties
  • pav1:importedFromSource
has super-properties
  • prov:wasDerivedFrom

Previous versionop back to ToC or Object Property ToC

IRI: http://purl.org/pav/previousVersion

The previous version of a resource in a lineage. For instance a news article updated to correct factual information would point to the previous version of the article with pav:previousVersion. If however the content has significantly changed so that the two resources no longer share lineage (say a new news article that talks about the same facts), they should be related using pav:derivedFrom.

A version number of this resource can be provided using the data property pav:version.

has equivalent properties
  • pav1:previousVersion
has super-properties
  • prov:wasRevisionOf

Provided byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/providedBy

The original provider of the encoded information (e.g. PubMed, UniProt, Science Commons), specially when it was retrieved, imported or derived from a resource published by the original provider. This new resource might therefore have a current dct:publisher which differs from its pav:providedBy.

Retrieved byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/retrievedBy

An entity responsible for retrieving the data from an external source.

The importer is usually a software entity which has done the retrieval from the original source without performing any transcription.

See pav:importedFrom for a discussion of import vs. retrieve vs. derived.

has super-properties
  • prov:wasAttributedTo

Retrieved fromop back to ToC or Object Property ToC

IRI: http://purl.org/pav/retrievedFrom

The URI where a resource has been retrieved from.

Retrieval indicates that this resource has the same representation as the original resource. If the resource has been somewhat transformed, use pav:importedFrom instead.

The time of the retrieval should be indicated using pav:retrievedOn. The agent may be indicated with pav:retrievedBy.

has super-properties
  • prov:wasDerivedFrom

Source accessed atop back to ToC or Object Property ToC

IRI: http://purl.org/pav/sourceAccessedAt

The resource is related to a given source which was accessed or consulted (but not retrieved, imported or derived from). This access can be detailed with pav:sourceAccessedBy and pav:sourceAccessedOn.

For instance, a curator (pav:curatedBy) might have consulted figures in a published paper to confirm that a dataset was correctly pav:importedFrom the paper's supplementary CSV file.

has super-properties
  • prov:wasInfluencedBy

Source accessed byop back to ToC or Object Property ToC

IRI: http://purl.org/pav/sourceAccessedBy

The resource is related to a source which was accessed or consulted

by the given agent. The source(s) should be specified using pav:sourceAccessedAt.

For instance, the given agent could be a curator (also pav:curatedBy) which consulted figures in a published paper to confirm that a dataset was correctly pav:importedFrom the paper's supplementary CSV file.

Data Properties

Authored ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/authoredOn

The date this resource was authored.

pav:authoredBy gives the authoring agent.

Note that pav:authoredOn is different from pav:createdOn, although they are often the same. See pav:createdBy for a discussion.

has super-properties

Contributed ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/contributedOn

The date this resource was contributed to.

pav:contributedBy provides the agent that contributed.

has sub-properties
Authored ondp, Curated ondp
has range
  • xsd:dateTime

Created Ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/createdOn

The date of creation of the resource.

has equivalent properties
  • pav1:createdOn
has range
  • xsd:dateTime

Curated ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/curatedOn

The date this resource was curated.

pav:curatedBy gives the agents that performed the curation.

has super-properties

Imported ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/importedOn

The date this resource was imported.

See pav:importedFrom for a discussion about import vs. retrieval.

has equivalent properties
  • pav1:importedOn
has range
  • xsd:dateTime

Last refreshed ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/lastRefreshedOn

The date of the last import of the resource. This property is used if this version has been updated due to a re-import, rather than the import creating new resources related using pav:previousVersion.

has equivalent properties
  • pav1:importedLastOn
has range
  • xsd:dateTime

Last updated ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/lastUpdateOn

The date of the last update of the resource. An update is a change which did not warrant making a new resource related using pav:previousVersion, for instance correcting a spelling mistake.

has equivalent properties
  • pav1:lastUpdateOn
has range
  • xsd:dateTime

Retrieved ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/retrievedOn

The date this resource was retrieved.

has range
  • xsd:dateTime

Source accessed ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/sourceAccessedOn

The resource is related to a source which was originally accessed or consulted on the given date as part of creating or authoring the resource. The source(s) should be specified using pav:sourceAccessedAt. If the source is subsequently checked again (say to verify validity), this should be indicated with pav:sourceLastAccessedOn.

In the case multiple sources being accessed at different times or by different agents, PAV does not distinguish who accessed when what. If such details are required, they may be provided by additionally using prov:qualifiedInfluence.

has equivalent properties
  • pav1:sourceAccessedOn
  • pav1:sourceFirstAccessedOn
has range
  • xsd:dateTime

Source last accessed ondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/sourceLastAccessedOn

The resource is related to a source which was last accessed or consulted on the given date. The source(s) should be specified using pav:sourceAccessedAt. Usage of this property indicates that the source has been checked previously, which the initial time should be indicated with pav:sourceAccessedOn.

This property can be useful together with pav:lastRefreshedOn or pav:lastUpdateOn in order to indicate a re-import or update, but could also be used alone, for instance when a source was simply verified and no further action was taken for the resource,

has equivalent properties
  • pav1:sourceLastAccessedOn
has range
  • xsd:dateTime

Versiondp back to ToC or Data Property ToC

IRI: http://purl.org/pav/version

The version number of a resource. This is a freetext string, typical values are "1.5" or "21". The URI identifying the previous version can be provided using prov:previousVersion.

has equivalent properties
  • pav1:versionNumber
has range
  • xsd:string

Namespace Declarations back to ToC

default namespace
http://purl.org/pav/
dc
http://purl.org/dc/elements/1.1/
dct
http://purl.org/dc/terms/
foaf
http://xmlns.com/foaf/0.1/
owl
http://www.w3.org/2002/07/owl#
pav
http://purl.org/pav/
pav1
http://swan.mindinformatics.org/ontologies/1.2/pav/
prov
http://www.w3.org/ns/prov#
rdf
http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs
http://www.w3.org/2000/01/rdf-schema#
xsd
http://www.w3.org/2001/XMLSchema#

This HTML document was obtained by processing the OWL ontology source code throughLODE, Live OWL Documentation Environment, developed bySilvio Peroni.