Wf4Ever Research Object Bundle

Abstract

This specification defines a file format for storage and distribution of Research Objects as a ZIP archive; called a Research Object Bundle (RO Bundle). RO Bundles allow capturing a Research Object to a single file or byte-stream by including its manifest, annotations and some or all of its aggregated resources for the purposes of exporting, archiving, publishing and transferring research objects.

2. Container

A Research Object Bundle is a structured [ZIP] archive, specializing the Adobe Universal Container Format [UCF]. UCF is based on the EPUB [OCF] format, but generalized to be any kind of container. The following section gives an informal introduction to the UCF format. For the complete, normative details, see the [UCF] specification.

2.1 Universal Container Format (UCF)

This section is non-normative.

An UCF container is based on the ZIP compression file format [ZIP], enforcing additional restrictions. The most important restrictions are:

Reserved filenames in the root directory: mimetype and META-INF
Filenames must be encoded in UTF-8
Compression must be Uncompressed or Flate
MAY use Zip64 extensions, but SHOULD only do so when required
The first file MUST be the uncompressed mimetype and without any extra attributes

UCF says about mimetype:

The first file in the Zip container MUST be a file with the ASCII name of mimetype, which holds the MIME type for the Zip container (~~application/epub+zip~~ as an ASCII string; no padding, white-space, or case change).

The actual media type to include in mimetype depends on the specific container type (the above quote uses ePub as an example). See section 2.2 RO bundle container.

Best Practice 1: Use zip -0 -X

To add the mimetype file correctly on a UNIX/Linux installation with InfoZip, use echo -n and zip -0 -X. Below is an example which adds mimetype correctly as the first, uncompressed file, then the remaining files (excluding mimetype) with the default compression:

Example 1

stain@ahtissuntu:~/test$ echo -n application/vnd.wf4ever.robundle+zip > mimetype 

stain@ahtissuntu:~/test$ zip -0 -X ../example.robundle mimetype
  adding: mimetype (stored 0%)

stain@ahtissuntu:~/test$ zip -X -r ../example.robundle . -x mimetype
  adding: META-INF/ (stored 0%)
  adding: META-INF/container.xml (stored 0%)
  adding: .ro/ (stored 0%)
  adding: .ro/manifest.json (stored 0%)
  adding: helloworld.txt (stored 0%)

UCF says about META-INF/container.xml and rootfiles:

A UCF Container MAY include a file named container.xml in the META-INF directory at the root level of the container file system. If present, the container.xml file MAY identify the MIME type of, and path to, the root file for the container and any OPTIONAL alternative renditions included in the container.

An example of META-INF/container.xml which defines the rootfile as .ro/manifest.json:

Example 2

<?xml version="1.0"?>
<container version="1.0"
    xmlns="urn:oasis:names:tc:opendocument:xmlns:container">
    <rootfiles>
        <rootfile full-path=".ro/manifest.json" media-type="application/ld+json" />
    </rootfiles>
</container>

2.2 RO bundle container

The RO Bundle container is a specialization of a [UCF] container, with the following additions:

Additional reserved filename in the root directory: .ro
The mimetype SHOULD be application/vnd.wf4ever.robundle+zip (see below)
The META-INF/container.xml, if present, SHOULD contain a rootfile entry equivalent to:
<rootfile full-path=".ro/manifest.json" media-type="application/ld+json" />
The file .ro/manifest.json SHOULD be present, and MUST describe the RO according to section 3. Manifest.

Applications who specialize RO Bundles MAY specify a different mimetype, for instance because the bundle is used to distribute application-specific data. It is RECOMMENDED for such extensions that their media type end with +zip according to [RFC6839] unless it is not considered meaningful for a user to treat such bundles as a general ZIP archive.

2.2.1 Resource media type

Beyond rootfiles, the UCF specification does not specify how to find the media-type when resolving individual resources in a bundle. If an application requrires a media-type for a resource in the RO bundle, it MAY use the defaults below based on case-insensitive comparison of the file extension. In the absence of a resolved media type, the media type application/octet-stream MAY be assumed.

Extension	Media type
`.txt`	`text/plain; charset="utf-8"`
`.ttl`	`text/turtle; charset="utf-8"`
`.rdf`	`application/rdf+xml`
`.json`	`application/json`
`.xml`	`application/xml`

Applications MAY use the file META-INF/manifest.xml, if present, to resolve media types for resources in the RO bundle accoording to the manifest:media-type of the corresponding manifest:file-entry according to ODF Package specification [ODF], see however warnings in section 2.2.2 META-INF/manifest.xml below.

2.2.2 META-INF/manifest.xml

To avoid confusion with the somewhat overlapping RO manifest it is NOT RECOMMENDED to include META-INF/manifest.xml in RO Bundles. Applications and specializations of this specification MAY however include META-INF/manifest.xml, for instance to provide media types as specified in section 2.2.1 Resource media type.

If META-INF/manifest.xml is present, it MUST follow the specifications of ODF Package specification [ODF] . That means that if present, the META-INF/manifest.xml file MUST list all resources in the RO bundle, including the folder .ro and its content, but excluding mimetype and META-INF its content.

3. Manifest

The research object SHOULD be described in the file .ro/manifest.json as specified below.

3.1 .ro/manifest.json

The file .ro/manifest.json, if present, MUST contain the [ORE] manifest for the research object according to this section. The file MUST be in JSON format [RFC4627], and SHOULD be valid [JSON-LD].

Identifiers used below are either:

Meta-resources, path relative to .ro/ folder, which SHOULD NOT contain the : character. For instance manifest.json or annotations/ann2. Depending on how meta-resources are used, the ZIP might or might not include a corresponding entry for the given path.
Bundled resources The path SHOULD starts with bundle: to indicate the root of the bundle, for instance bundle:hello.txt or bundle:folder2/. Folders SHOULD have a path terminating with /. The resource identified by the path SHOULD be included as a corresponding file or folder in the ZIP file.
Absolute URIs (contains :), external to the bundle. For instance http://example.com/external

The structure of the JSON manifest is given by an JSON Object with the keys:

@context

JSON-LD context. MUST be present. SHOULD have value "http://purl.org/wf4ever/ro-bundle/context.json", but MAY be a list, which SHOULD have this value as the last item.

id

RO identifier. SHOULD be present, in which case it SHOULD have the fixed value "bundle:" indicating the relative top-level folder as the identifier. Note that this means the absolute URI identifying the research object depends on the base URI this Research Object Bundle is considered to be accessed at, for instance file:///Users/alice/ro13.robundle/ (See section 4. Identifiers.)

manifest

ORE manifests describing this RO, relative to the .ro/ folder. SHOULD be literal "manifest.json", but MAY be a list, in which case the list MUST contain "manifest.json"

createdOn

The time the RO was serialized as this RO bundle. SHOULD be present, in which case it MUST be a xsd:dateTime formatted timestamp (ISO 8601).

createdBy

The creator of the RO bundle. This MAY be different from the person forming the research object, which SHOULD be indicated with authoredBy. The creator SHOULD be an object with the following keys:

uri: An URI identifying the agent. The URI SHOULD be present, and SHOULD be a valid WebID, for instance http://example.com/fred#fred
orcid: An ORCID identifier (as an URI). For instance, http://orcid.org/0000-0001-9842-9718. An ORCID MAY be present if known.
name: The full name of the agent. The name SHOULD be present, for instance "John Doe" or "University of Manchester"

Additional foaf: properties MAY be added to the top-level @graph according to section 3.1.2 Custom JSON-LD by using a @id equal to the creator uri.

authoredOn

The time the Research Object was conceptually formed. SHOULD be present if different from createdOn. The value MUST be a xsd:dateTime formatted timestamp (ISO 8601)

authoredBy

The author of the Research Object, the agent(s) that conceptually formed the RO. SHOULD be present if different from createdBy. SHOULD be an object with the same keys and requirements as for createdBy, but MAY be a list to indicate multiple authors.

Additional authorship information (curation, contribution, etc) MAY be added using the pav: namespace within the top-level @graph key according to section 3.1.2 Custom JSON-LD by using an @id value equal to the bundle id, e.g. "bundle:".

history

Provenance trace of the life of this RO, relative to the .ro/ folder. This property MAY be present, in which case it SHOULD be "evolution.ttl", indicating that the file .ro/evolution.ttl contains the provenance trace. This value MAY be a URI. The property MAY give a list if several provenance traces are known, in which case the list SHOULD include "evolution.ttl".

The file .ro/evolution.ttl, if present, SHOULD include a provenance trace of this research object according to the roevo ontology.

aggregates

This property SHOULD be present, in which case it MUST be a list of the resources aggregated by this RO. The values in a list MUST be either:

A path relative to the root of the bundle, prefixed with bundle:
An absolute URI
An object, which SHOULD be uniquely identified by either file or uri. Its members are:

file

A path relative to the root of the bundle. The path SHOULD be prefixed with bundle:

uri

An absolute URI. The key uri MUST NOT be provided at the same time as file.

folder

A folder this resource (typically identified by uri) belongs to, relative to the root of the bundle. The path SHOULD be prefixed with bundle: and SHOULD end with /, for instance bundle:folder2/.

mediatype

The IANA media type of the (typically identified by file) resource. This SHOULD be specified for a resource identified by file, unless its media type is correctly identified according to section 2.2.1 Resource media type.

createdOn

createdBy

File creation date and creator, as specified above.

proxy

The identifier for an ORE proxy [ORE] for the resources as aggregated by this RO. This property is intended the purposes of referring to "resource X as aggregated in research object Y" within annotations and in external documents. This property SHOULD be given for external resources aggregated by "uri" references, as they could be aggregated in multiple ROs, and MAY be given for other resources.
The proxy identifier SHOULD consist of the prefix proxy: and a lowercased UUID string [RFC4122]. For example: proxy:d4f09040-272e-467f-9250-59593bd4ac8f

Additional metadata about a resource, if present, SHOULD be added as an annotation (see below).

The order of the values in the aggregates list is insignificant, however the list MUST NOT contain duplicate entries. An entry is considered duplicate by comparing literal values and members file and uri uniformly as URIs [URI].

annotations

Annotations, MAY be present, in which case it MUST be a list. An annotation provides additional metadata or descriptions which are somewhat about or related to the research object or some of its aggregated resources.

An annotation is specified as an object, which have the following members:

annotation

The identifier for this annotation. The identifier SHOULD be present, and SHOULD consist of the prefix annotation: and a lowercased UUID string [RFC4122]. For example: annotation:1a876f9e-4ffe-4c99-a05d-cd9d0cbd4cbb

about

The identifier for the annotated resource, MUST be present. This is considered the target of the annotation, that is the resource the annotation content is "somewhat about". The "about" identifier SHOULD be one of these types:

The research object itself, which SHOULD match the value of its id, e.g. bundle:
A bundled resource, starting with bundle:, which SHOULD be listed under aggregates if that key is present
A proxy for an aggregated resource, starting with proxy:, which MUST be defined under aggregates with a matching value for proxy
Another annotation, starting with annotation:, which MUST be defined under annotations
An absolute URI, which may or may not be aggregated by the RO
A list, containing any of the above. This indicates that the annotation is about each of the listed resources, for instance because the annotation content is describing their relationships

content

The identifier for a resource that contains the body of the annotation, SHOULD be present. The content identifier SHOULD be one of these types:

A bundled resource, starting with bundle:, which SHOULD be listed under aggregates if that key is present
An absolute URI, which may or may not be aggregated by the RO

Additional properties describing the annotation using the oa: namespace MAY be added to the top-level @graph according to section 3.1.2 Custom JSON-LD by using a @id matching the annotation identifier.

@graph

A list of additional [JSON-LD] statements according to section 3.1.2 Custom JSON-LD.

An example of a manifest which is valid JSON-LD is included below:

Example 3

{
  "@context":  [
    { "@base": "widget://129b8efe-a692-48a0-85d4-ebc6c0a9b057/.ro/" },
    "http://purl.org/wf4ever/ro-bundle/context.json"
  ],
  "id": "/",
  "manifest":  "manifest.json",
  "createdOn": "2013-03-05T17:29:03Z",
  "createdBy": {
      "uri":     "http://example.com/foaf#alice",
      "orcid":   "http://orcid.org/0000-0002-1825-0097",
      "name":    "Alice W. Land" },
  "history":   "evolution.ttl",
  "aggregates": [
     "/folder/soup.jpeg",
     "http://example.com/blog/",

     { "file":      "/README.txt",
       "mediatype": "text/plain",
       "createdBy": {
           "uri":     "http://example.com/foaf#bob",
           "name":    "Bob Builder" },
       "createdOn": "2013-02-12T19:37:32.939Z" },

     { "uri":    "http://example.com/external.txt",
       "folder": "/folder/",
       "proxy":  "uuid:a0cf8616-bee4-4a71-b21e-c60e6499a644" }
  ],
  "annotations": [
    { "annotation": "uuid:d67466b4-3aeb-4855-8203-90febe71abdf",
      "about":      "/folder/soup.jpeg",
      "content":    "annotations/soup-properties.ttl" },

    { "about":   "uuid:a0cf8616-bee4-4a71-b21e-c60e6499a644",
      "content": "http://example.com/blog/they-aggregated-our-file" },

    { "about":   [ "/", "uuid:d67466b4-3aeb-4855-8203-90febe71abdf" ],
      "content": "annotations/a-meta-annotation-in-this-ro.txt" }
  ]
}

3.1.1 JSON-LD and mapping to RO model

Manifests following the JSON structure defined in section 3.1 .ro/manifest.json with a "@context": "http://purl.org/wf4ever/ro-bundle/context.json" is intended to be valid [JSON-LD] without any additional modifications. Mapping .ro/manifest.json to the ORE and [RO] models in RDF SHOULD be performed according to the algorithm for conversion from JSON to RDF, as specified in the JSON-LD API [JSON-LD].

Describe JSON-LD context

Example 4

{
  "@context": {
    "ao": "http://purl.org/ao/",
    "oa": "http://www.w3.org/ns/oa#",
    "dc": "http://purl.org/dc/elements/1.1/",
    "dct": "http://purl.org/dc/terms/",
    "ore": "http://www.openarchives.org/ore/terms/",
    "ro": "http://purl.org/wf4ever/ro#",
    "roterms": "http://purl.org/wf4ever/roterms#",
    "robundle": "http://purl.org/wf4ever/robundle#",
    "prov": "http://www.w3.org/ns/prov#",
    "pav": "http://purl.org/pav/",
    "xsd": "http://www.w3.org/2001/XMLSchema#",
    "foaf": "http://xmlns.com/foaf/0.1/",
    "uuid": "urn:uuid:",

    "id": "@id",
    "file": "@id",
    "uri": "@id",
    "annotation": "@id",

    "manifest": {
        "@id": "ore:isDescribedBy",
        "@type": "@id"
    },

    "createdOn": {
        "@id": "pav:createdOn",
        "@type": "xsd:dateTime"
    },
    "createdBy": {
        "@id": "pav:createdBy",
        "@type": "@id"
    },
    "authoredOn": {
        "@id": "pav:authoredOn",
        "@type": "xsd:dateTime"
    },
    "authoredBy": {
        "@id": "pav:authoredBy",
        "@type": "@id"
    },
    "curatedOn": {
        "@id": "pav:curatedOn",
        "@type": "xsd:dateTime"
    },
    "curatedBy": {
        "@id": "pav:curatedBy",
        "@type": "@id"
    },
    "contributedOn": {
        "@id": "pav:contributedOn",
        "@type": "xsd:dateTime"
    },
    "contributedBy": {
        "@id": "pav:contributedBy",
        "@type": "@id"
    },
    "name": {
        "@id": "foaf:name"
    },
    "orcid": {
        "@id": "roterms:orcid",
        "@type": "@id"
    },

    "history": {
        "@id": "prov:has_provenance",
        "@type": "@id"
    },
    "aggregates": {
      "@id": "ore:aggregates",
      "@type": "@id"
    },
    "mediatype": {
        "@id": "dc:format"
    },
    "folder": {
      "@id": "robundle:inFolder",
      "@type": "@id"
    },
    "proxy": {
      "@id": "robundle:hasProxy",
      "@type": "@id"
    },

    "annotations": {
      "@id": "robundle:hasAnnotation",
      "@type": "@id"
    },
    "content": {
       "@id": "oa:hasBody",
       "@type": "@id"
    },
    "about": {
       "@id": "oa:hasTarget",
       "@type": "@id" 
    }

  }
}

As an example of this processing, below is a Turtle representation after processing the .ro/manifest.json shown as an example in section 3.1 .ro/manifest.json:

Example 5

Generate example

3.1.2 Custom JSON-LD

Applications who support JSON-LD (rather than just JSON) MAY choose to parse and generate additional statements in .ro/manifest.json according to the [JSON-LD] specifications.

Applications generating JSON-LD MAY use a @context list, but SHOULD include http://purl.org/wf4ever/bundle/context.json as the last item in the list to indicate to JSON parsers that the manifest can be parsed as plain JSON according to section 3.1 .ro/manifest.json. Applications SHOULD NOT use @context at deeper nexting levels, except within the top level @graph.

Applications SHOULD NOT write additional properties directly to JSON-LD nodes defined from section 3.1 .ro/manifest.json. Instead, additional statements SHOULD be made within an additional @graph node according to JSON-LD Named Graphs. @graph SHOULD only be added to the top-level object. For example:

{
  "@context": "http://purl.org/wf4ever/ro-bundle/context.json",
  "id": "bundle:",
  "manifest": "manifest.json",
  "aggregates": [
     "http://example.com/blog/2012",
     "http://example.com/blog/2013"
  ],
  "@graph": [
    { "@id": "http://example.com/blog/2013",
      "dcterms:replaces": "http://example.com/blog/2012" },
    { "@id": "http://example.com/blog/2013",
      "dcterms:isReplacedBy": "http://example.com/blog/2013" }
  ]
}

Note that rather than using the above extension mechanism, it is generally RECOMMENDED to instead store such additional statements in an annotation body for purposes of provenance and separation of concern. Although technically valid, it is NOT RECOMMENDED to use the member @graph to embed semantic annotation bodies within annotations nodes, as it would duplicate the content of the annotation body in the bundle and may lead to inconsistencies.

3.2 Alternative manifest representations

In addition to the .ro/manifest.json representation specified in section 3.1 .ro/manifest.json, a Research Object Bundle MAY include the ORE manifest in alternative representations like RDF/XML [RDF-SYNTAX-GRAMMAR] and Turtle [TURTLE], for instance by generating them using the conversion from JSON to RDF algorithm in JSON-LD API [JSON-LD].

Alternative manifests SHOULD have a path starting with .ro/manifest, for instance .ro/manifest.ttl for a Turtle representation.
When multiple manifests are present, applications SHOULD consider .ro/manifest.json as the authorative representation of the research object.
Alternative manifests SHOULD represent the equivalent RDF graph of .ro/manifest.json (see section 3.1.1 JSON-LD and mapping to RO model)
Alternative manifests SHOULD be listed in the META-INF/container.xml as <rootfile> entries with corresponding media-type attributes.
Any alternative manifest listed as a rootfile MUST minimally represent the same conceptual information as .ro/manifest.json

4. Identifiers

This section is non-normative.

Objects in a research object bundle are identified within the JSON manifest using different JSON-LD prefixes, which could be thought of as local URI schemes, which resolves to relative URI references based at the root of the ZIP archive.

Prefix	Relative URI reference
(no prefix)	`.ro/`
`bundle:`	`./`
`proxy:`	`.ro/proxies/`
`annotation:`	`.ro/annotations/`
(other)	Absolute URI

Due to their nature as ZIP files, Research Object Bundles might be downloaded, copied, moved and republished. In order to avoid ambiguity about RO identity and evolution, each Research Object Bundle serialization is considered to represent unique Research Objects. Thus any of the prefixes above describing resources within the bundle are relative to the root of the ZIP file, and the id identifying the Research Object is set to bundle:, meaning the root represents the RO itself.

4.1 Absolute URIs for bundle resources

This section is non-normative.

Applications which require an absolute URI for identifying a resource within a Research Object Bundle may choose to use one of the approaches presented in this section in combination with resolving against the prefix table above.

No decission has been made on which of these methods - if any - should be recommended; meanwhile these subsections contain a list of advantages and disadvantages to guide the reader. kept

4.1.1 Nested paths

This section is non-normative.

If an RO bundle is published at a HTTP (or HTTPS) server, then URIs to the bundled resources can be minted by assuming a base URI of the RO Bundle URI with / appended. For instance, if:

http://example.com/example1.robundle

contains the file folder/helloworld.txt (bundle:folder/helloworld.txt in the manifest.json), then we can assume the base URI:

http://example.com/example1.robundle/

and can refer to the file as:

http://example.com/example1.robundle/folder/helloworld.txt

A web server that exposes RO bundles MAY support resolving such nested URIs by internally extracting the resources from the ZIP archive or redirecting to an existing resource, for instance because it is implementing the [ROSRS] API.

Semantically, the distinction between the URI with or without the trailing / is that say example1.robundle identifies the RO Bundle, e.g. the ZIP archive, which has attributes such as size in bytes, checksum, etc, while example1.robundle/ identifies the slightly more abstract concept of the Research Object (the aggregation) that is serialized as a RO bundle.

Advantages:

Relative URI references within the RO bundle can be interpreted by using the calculated base URI.
Relative URI references between resources resolve consistently (for instance img/picture.jpeg linked from http://example.com/example1.robundle/document.html resolves to http://example.com/example1.robundle/img/picture1.jpeg)
Web server could optionally handle request and expose nested resources (or at least redirect to the ZIP file)
Fragment identifiers within resources are still valid

Disadvantages

Resolving the nested URIs would generally give a misleading 404 Not Found
Could encourage minting of URIs in namespaces manage by others
Given such a URI, to find the RO bundle, one might have to assume that everything after .robundle should be removed. (URL hacking).
Malicious or malformed URI references within RO bundles might be resolved to expose local resources, specially within the file scheme. For instance the reference ../../etc/passwd from file:///tmp/example1.robundle/evil.html could be resolved to file:///etc/passwd

This technique SHOULD NOT be used if:

The URI for the RO bundle is not using a hierarchical scheme (e.g. urn:uuid:f81d4fae-7dec-11d0-a765-00a0c91e6bf6)
The URI for the RO bundle contains a query or fragment identifier, e.g, http://example.com/ro?id=13
The web server is not under the control of the minter of the URIs

4.1.2 Fragment identifiers

The IANA media type registration for application/zip and the +zip suffix of [RFC6839] does not define fragment identifiers for ZIP archives. However, RO bundles identified with the recommended media type of application/vnd.wf4ever.robundle+zip MAY be interpreted to have a fragment identifier that is resolvable as the path within the ZIP archive.

For instance, if:

http://example.com/example1.robundle

contains the file folder/helloworld.txt (bundle:folder/helloworld.txt in the manifest.json), then we can refer to the Research Object as

http://example.com/example1.robundle#

and can refer to the file as:

http://example.com/example1.robundle#folder/helloworld.txt

Advantages:

URIs always resolve (to the ZIP file)
Simple to understand

Disadvantages:

Not formally defined by application/zip
Fragment identifiers are bound by the media type and might not work for other content types. (e.g. an RO bundle was returned due to content negotiation in a server implenting the [ROSRS] API)
Relative URI references in resources more complicated to resolve as they would easily seem to refer to siblings of the RO bundle, a URI reference img.jpg from http://exaxmple.com/example1.robundle#document.html would seem to refer to http://exaxmple.com/img.jpg
Existing fragment identifiers within resources more complicated to resolve - #para2 in document.html becomes http://example.com/example1.robundle#document.html#para2

4.1.3 jar: scheme

The jar: scheme is used by Java to refer to resources within JAR files. It is specified as part of the documentation for JarURLConnection. In short, a JAR URI is formed by combining jar:, the original URI of the JAR file, the separator !/, and the path within the JAR file. For all practical purposes, an RO bundle, being a ZIP archive, can be interpreted as a JAR file. For instance, if:

http://example.com/example1.robundle

contains the file folder/helloworld.txt (bundle:folder/helloworld.txt in the manifest.json), then we can assume the base URI

jar:http://example.com/example1.robundle!/

and can refer to the file as:

jar:http://example.com/example1.robundle!/folder/helloworld.txt

Advantages:

Somewhat resolvable (within Java)
Somewhat formally defined
Fragment identifiers within resources still valid

Disadvantages:

The jar: scheme is not hierarchical (it does not use //, and so relative URI references within RO bundle resources are not correctly resolved (not even by java.net.URI).
Misleading and potentially confusing: JAR files have their own manifest in META-INF/MANIFEST.MF.
Gives impression that RO Bundle is executable with JAva

4.1.4 Widget URI scheme

The Widget URI scheme defines how a URI can be formed for the purposes of accessing resources within a ZIP file as if it was a HTTP server. While this is intended for sandboxing Packaged web apps, it is equally applicable to Research Object bundles for the purposes of sandboxing.

The Widget URI scheme recommends generating a UUID string [RFC4122] for minting the authority, forming the base URI for the RO bundle. For instance, if:

http://example.com/example1.robundle

contains the file folder/helloworld.txt (:bundle:folder/helloworld.txt in the manifest.json), then we generate a new UUID 8191dee8-0b8e-452d-8d64-7706a140185e and refer to the Research Object as

widget://8191dee8-0b8e-452d-8d64-7706a140185e/

and can refer to the file as:

widget://8191dee8-0b8e-452d-8d64-7706a140185e/folder/helloworld.txt

For purposes of security/sandboxing when interpreting RO bundles, the authority should be a v4 UUID from random numbers. For purposes of describing the content of an RO bundle at a given URI, the authority should be Name based UUID using v5 (SHA-1 hashing). For purposes of describing the content of an RO bundle as a bytestream independent of its location (for instance on a USB stick), then the authority should be the hexadecimal SHA-256 checksum of the ZIP archive.

Example widget base URIs

widget://15259726-dcbb-42ff-8fc6-36282c98d4e6/ UUID v4 using pseduo-random number
widget://7878e885-327c-5ad4-9868-7338f1f13b3b/ UUID v5 of the URL http://example.com/bundle1.robundle
widget://587cff3ae37d58af6886d656623bd91237759a42d8fe1575a9744898c01d97d7/ SHA-256 of an empty RO bundle

Recommend just ONE of these? Which?

Advantages:

Relative URIs and fragments are valid without modifications
Sandboxed by RO bundle
Defined (HTTP-like) resolution mechanism
Predictable base URIs (UUIDv5 based on URL or SHA-256 based on bytes)

Disadvantages:

Can't be dereferenced
Status is W3C WG Note, was abandoned as W3C Specification (Check: meaning what?)

Wf4Ever Research Object Bundle

Working Draft 10 May 2013

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Container

2.1 Universal Container Format (UCF)

2.2 RO bundle container

2.2.1 Resource media type

2.2.2 META-INF/manifest.xml

3. Manifest

3.1 .ro/manifest.json

3.1.1 JSON-LD and mapping to RO model

3.1.2 Custom JSON-LD

3.2 Alternative manifest representations

4. Identifiers

4.1 Absolute URIs for bundle resources

4.1.1 Nested paths

4.1.2 Fragment identifiers

4.1.3 jar: scheme

4.1.4 Widget URI scheme

5. Conformance

A. Acknowledgements

B. References

B.1 Normative references

B.2 Informative references