PAV is a lightweight ontology for tracking Provenance, Authoring and Versioning. PAV specializes the W3C provenance ontology PROV-O in order to describe authorship, curation and digital creation of online resources.
This ontology describes the defined PAV properties and their usage. Note that PAV does not define any explicit classes or domain/ranges, as every property is meant to be used directly on the described online resource.
PAV supplies terms for distinguishing between the different roles of the agents contributing content in current web based systems: contributors, authors, curators and digital artifact creators. The ontology also provides terms for tracking provenance of digital entities that are published on the web and then accessed, transformed and consumed. In order to support broader interoperability, PAV specializes the general purpose W3C PROV provenance model (PROV-O).
PAV distinguishes between the data related to the digital artifact - named Provenance - and those related to the actual knowledge creation and therefore to the intellectual property aspects – named Authoring. The Versioning axis describes the evolution of digital entities in time.
Using PAV, descriptions can define the Authors that originate or gave existence to the work that is expressed in the digital resource (pav:authoredBy); curators (pav:curatedBy) who are content specialists responsible for shaping the expression in an appropriate format, and contributors (super-property pav:contributedBy) that provided some help in conceiving the resource or in the expressed knowledge creation/extraction.
These provenance aspects can be detailed with dates using pav:curatedOn, pav:authoredOn, etc. Further details about the creation activities, such as different authors contributing specific parts of the resource at different dates are out of scope for PAV and should be defined using vocabularies like PROV-O and additional intermediate entities to describe the different states.
For resources based on other resources, PAV allows specification of direct retrieval (pav:retrievedFrom), import through transformations (pav:importedFrom) and sources that were merely consulted (pav:sourceAccessedAt). These aspects can also define the agents responsible using pav:retrievedBy, pav:importedBy and pav:sourceAccessedBy. Version information can be specified using pav:previousVersion and pav:version.
The creation of the digital representation, for instance an RDF graph, can in many cases be different from the authorship of the knowledge, and in PAV this digital creation is specified using pav:createdBy, pav:createdWith and pav:createdOn.
PAV 2.1 updates PAV 2.0 with PROV-O specializations and more detailed descriptions of the defined terms. Note that PROV-O is not imported directly by this ontology as PAV can be used independent of PROV. PAV 2 is based on PAV 1.2 but in a new namespace ( http://purl.org/pav/ ). Terms compatible with 1.2 are indicated in this ontology using owl:equivalentProperty.
The ontology IRI http://purl.org/pav/ always resolve to the latest version of PAV 2. Particular versionIRIs such as http://purl.org/pav/2.1 can be used by clients to force imports of a particular version - note however that all terms are defined directly in the http://purl.org/pav/ namespace.
The goal of PAV is to provide a lightweight, straight forward way to give the essential information about authorship, provenance and versioning, and therefore these properties are described directly on the published resource. As such, PAV does not define any classes or restrict domain/ranges, as all properties are applicable to any online resource.
IRI: http://purl.org/pav/authoredBy
IRI: http://purl.org/pav/contributedBy
The resource was contributed to by the given agent.
The agent provided any sort of help in conceiving the work that is expressed by the digital artifact. Superproperty of pav:authoredBy and pav:curatedBy.
Note that as pav:contributedBy identifies only agents that contributed to the work, knowledge or intellectual property, and not agents that made the digital artifact or representation (pav:createdBy), thus this property can be considered more precise than dct:contributor. See pav:createdBy for a discussion.
The date of contribution can be expressed using pav:contributedOn - note however in the case of multiple contributors that there is no relationship in PAV identifying which agent contributed when or what. If capturing such lineage is desired, it should be additionally expressed using activity-centric provenance vocabularies, for instance with prov:wasGeneratedBy and prov:qualifiedAssocation.
IRI: http://purl.org/pav/createdAt
The geo-location of the agents when creating the resource (pav:createdBy). For instance a photographer takes a picture of the Eiffel Tower while standing in front of it.
IRI: http://purl.org/pav/createdBy
An agent primary responsible for making the digital artifact or resource representation.
This property is distinct from pav:authoredBy, which identifies who authored the knowledge expressed by this resource; and pav:curatedBy, which identifies who curated the knowledge into its current form.
pav:createdBy is more specific than dct:createdBy - which might or might not be interpreted to cover this creator.
For instance, the author wrote 'this species has bigger wings than normal' in his log book. The curator, going through the log book and identifying important knowledge, formalizes this as 'locus perculus has wingspan > 0.5m'. The creator enters this knowledge as a digital resource in the knowledge system, thus creating the digital artifact (say as JSON, RDF, XML or HTML).
A different example is a news article. pav:authoredBy indicates the journalist who wrote the article. pav:contributedBy can indicate the artist who added an illustration. pav:curatedBy can indicate the editor who made the article conform to the news paper's style. pav:createdBy can indicate who put the article on the web site.
The software tool used by the creator to make the digital resource (say Protege, Wordpress or OpenOffice) can be indicated with pav:createdWith.
The date the digital resource was created can be indicated with pav:createdOn.
The location the agent was at when creating the digital resource can be made using pav:createdAt.
IRI: http://purl.org/pav/createdWith
The software/tool used by the creator when making the digital resource.
For instance: Protege, Wordpress, LibreOffice.
IRI: http://purl.org/pav/curatedBy
An agent specialist responsible for shaping the expression in an appropriate format. Often the primary agent responsible for ensuring the quality of the representation.
The curator is different from the creator of the digital resource (although they are often the same), see pav:createdBy for a discussion.
The date of curating can be expressed using pav:curatedOn - note however in the case of multiple curators that there is no relationship in PAV identifying which agent contributed when or what. If capturing such lineage is desired, it should be additionally expressed using activity-centric provenance vocabularies, for instance with prov:wasGeneratedBy and prov:qualifiedAssocation.
IRI: http://purl.org/pav/curates
Provided for backwards compatibility with PAV 1.2 only. Use instead the inverse pav:curatedBy.
IRI: http://purl.org/pav/derivedFrom
Derived from a different resource. Derivation conserns itself with derived knowledge. If this resource has the same content as the other resource, but has simply been transcribed to fit a different model (like XML -> RDF or SQL -> CVS), use pav:importedFrom. If the content has been further refined or modified, use pav:derivedFrom.
IRI: http://purl.org/pav/importedBy
An entity responsible for importing the data.
The importer is usually a software entity which has done the transcription from the original source.
See pav:importedFrom for a discussion of import vs. retrieve vs. derived.
IRI: http://purl.org/pav/importedFrom
The original source of imported information.
Import means that the content has been preserved, but transcribed somehow, for instance to fit a different representation model. Examples of import are when the original was JSON and the current resource is RDF, or where the original was an document scan, and this resource is the plain text found through OCR.
The imported resource should somehow convey the same knowledge/content as the original source. If additional knowledge has been contributed, pav:derivedFrom would be more appropriate.
If the resource has been copied verbatim from the original representation (e.g. downloaded), use pav:retrievedFrom.
To indicate which agent(s) performed the import, use pav:importedBy. Use pav:importedOn to indicate when it happened.
IRI: http://purl.org/pav/previousVersion
The previous version of a resource in a lineage. For instance a news article updated to correct factual information would point to the previous version of the article with pav:previousVersion. If however the content has significantly changed so that the two resources no longer share lineage (say a new news article that talks about the same facts), they should be related using pav:derivedFrom.
A version number of this resource can be provided using the data property pav:version.
IRI: http://purl.org/pav/providedBy
The original provider of the encoded information (e.g. PubMed, UniProt, Science Commons), specially when it was retrieved, imported or derived from a resource published by the original provider. This new resource might therefore have a current dct:publisher which differs from its pav:providedBy.
IRI: http://purl.org/pav/retrievedBy
An entity responsible for retrieving the data from an external source.
The importer is usually a software entity which has done the retrieval from the original source without performing any transcription.
See pav:importedFrom for a discussion of import vs. retrieve vs. derived.
IRI: http://purl.org/pav/retrievedFrom
The URI where a resource has been retrieved from.
Retrieval indicates that this resource has the same representation as the original resource. If the resource has been somewhat transformed, use pav:importedFrom instead.
The time of the retrieval should be indicated using pav:retrievedOn. The agent may be indicated with pav:retrievedBy.
IRI: http://purl.org/pav/sourceAccessedAt
The resource is related to a given source which was accessed or consulted (but not retrieved, imported or derived from). This access can be detailed with pav:sourceAccessedBy and pav:sourceAccessedOn.
For instance, a curator (pav:curatedBy) might have consulted figures in a published paper to confirm that a dataset was correctly pav:importedFrom the paper's supplementary CSV file.
IRI: http://purl.org/pav/sourceAccessedBy
The resource is related to a source which was accessed or consulted
by the given agent. The source(s) should be specified using pav:sourceAccessedAt.
For instance, the given agent could be a curator (also pav:curatedBy) which consulted figures in a published paper to confirm that a dataset was correctly pav:importedFrom the paper's supplementary CSV file.
IRI: http://purl.org/pav/authoredOn
The date this resource was authored.
pav:authoredBy gives the authoring agent.
Note that pav:authoredOn is different from pav:createdOn, although they are often the same. See pav:createdBy for a discussion.
IRI: http://purl.org/pav/contributedOn
The date this resource was contributed to.
pav:contributedBy provides the agent that contributed.
IRI: http://purl.org/pav/createdOn
The date of creation of the resource.
IRI: http://purl.org/pav/curatedOn
The date this resource was curated.
pav:curatedBy gives the agents that performed the curation.
IRI: http://purl.org/pav/importedOn
The date this resource was imported.
See pav:importedFrom for a discussion about import vs. retrieval.
IRI: http://purl.org/pav/lastRefreshedOn
The date of the last import of the resource. This property is used if this version has been updated due to a re-import, rather than the import creating new resources related using pav:previousVersion.
IRI: http://purl.org/pav/lastUpdateOn
The date of the last update of the resource. An update is a change which did not warrant making a new resource related using pav:previousVersion, for instance correcting a spelling mistake.
IRI: http://purl.org/pav/retrievedOn
The date this resource was retrieved.
IRI: http://purl.org/pav/sourceAccessedOn
The resource is related to a source which was originally accessed or consulted on the given date as part of creating or authoring the resource. The source(s) should be specified using pav:sourceAccessedAt. If the source is subsequently checked again (say to verify validity), this should be indicated with pav:sourceLastAccessedOn.
In the case multiple sources being accessed at different times or by different agents, PAV does not distinguish who accessed when what. If such details are required, they may be provided by additionally using prov:qualifiedInfluence.
IRI: http://purl.org/pav/sourceLastAccessedOn
The resource is related to a source which was last accessed or consulted on the given date. The source(s) should be specified using pav:sourceAccessedAt. Usage of this property indicates that the source has been checked previously, which the initial time should be indicated with pav:sourceAccessedOn.
This property can be useful together with pav:lastRefreshedOn or pav:lastUpdateOn in order to indicate a re-import or update, but could also be used alone, for instance when a source was simply verified and no further action was taken for the resource,
IRI: http://purl.org/pav/version
The version number of a resource. This is a freetext string, typical values are "1.5" or "21". The URI identifying the previous version can be provided using prov:previousVersion.
This HTML document was obtained by processing the OWL ontology source code throughLODE, Live OWL Documentation Environment, developed bySilvio Peroni.
An agent that originated or gave existence to the work that is expressed by the digital resource.
The author of the content of a resource may be different from the creator of the resource representation (although they are often the same). See pav:createdBy for a discussion.
The date of authoring can be expressed using pav:authoredOn - note however in the case of multiple authors that there is no relationship in PAV identifying which agent contributed when or what. If capturing such lineage is desired, it should be additionally expressed using activity-centric provenance vocabularies, for instance with prov:wasGeneratedBy and prov:qualifiedAssocation.