Skip to end of metadata
Go to start of metadata

Page Contents

 

Corpus Utilities

To perform various kinds of operations on a particular Corpus, select its name-link in the Corpora home listing. For example, selecting Corpora > [name-link] (not the Edit link) shows its Operations view, which has an active header section and multiple categories of operations (subviews).

Name and Description

The Corpus's header section shows its name and description, each of which managers can edit by clicking on the text, editing, and confirming by clicking on the edit-box's check-mark.

Edit Production Copy

This link opens an editing view on the Corpus's production copy. See the Corpus Editing page.  See Workflow Overview, EDG for details on the difference between using workflows and directly editing production copies.

Utility Groups: Dashboard, Settings, etc.

Sub-views of various operations on the asset collection are grouped by functional area.

Dashboard View

A model's dashboard summarizes various kinds of status and quality measurements that are relevant for that model's type.

Process

Process summarizes information about tasks and workflows.


Settings View

The Settings view of this collection pertains to its various references (e.g., inclusions, URI, and metadata).

Included-By and Includes

These sections show the references (owl:imports) from and to other models. Note that the choice of models that may be included is restricted to types that are either required or permitted for the current model. See the main Corpora page (under Home > Create) for additional documentation.

Graph URI

This is an internal identifier for any data asset created by EDG.

Document Class

Any editor or manager of this Corpus can select the root class for instances that shall be considered as Documents. This class must be defined in an included Ontology or other asset collection, or locally in the Corpus if Ontology Optimizations are turned off.

Page URL regex

For Corpora defined with a sitemap.xml connector, the configured pages will be limited to the subset of those matching this regular expression.

CSS Selectors

For Corpora defined with either a sitemap.xml or a URL list connector, HTML elements matching the Content CSS Selector or Non-content CSS Selector will be respectively the only ones imported from or those pruned from the configured pages. The CSS Selector standard is defined in the W3C Technical Report.

Metadata

The Metadata settings let you view or edit information about the Corpus. There is a rich selection of metadata fields, and it is easy to configure EDG to include additional fields if required. The metadata is organized into sections. The view mode only shows the sections and fields that contain information, while the edit mode shows all sections and all fields.

Overview

This section records descriptive properties of the Corpus, including its official name (if different from the common name or label).

Status

The Status section records the life cycle stages of the Corpus.

User Roles View

Corpus Permission Roles

The User Roles view allows managers of a production collection or a workflow to assign access permissions (viewer, editor, or manager) to various users, either as individuals or as security roles (e.g., from LDAP). A production manager can also assign permissions on its child workflows. For non-managers, this view is read-only.

Whereas a production collection allows settings for any EDG user, a workflow copy only allows access to viewers (at least) of the parent production collection. Because each collection or workflow assigns its permissions separately, a given user can have different access permissions for a particular production copy and one of its child workflows, or for two different workflows. A blank setting excludes access to the user or role. Any user with multiple assignments on a given collection or workflow receives the greatest level assigned. See Workflow Overview - Asset Collection Permissions: Viewer, Editor, and Manager for more information.

Corpus Governance Roles

For collections with governance roles, this section allows managers to assign users, as individuals, as security roles (e.g., from LDAP), or as job titles. For details on governance roles, see Governance Model Overview > Governance Areas (and Roles).

Import/Export View: Import...

From any Corpus's production or working-copy home page, the Import functions lets editors copy graph data into the given Corpus from external sources such as RDF files, spreadsheets, etc.

Import RDF File

Any Corpus can import data from an external RDF file (in a serialized format). The Import > Import RDF File link shows a screen where the Choose File button opens a dialog for picking the external source file.

Choose the source file and identify its format. Decide whether to record new triples in the change history (use with care!) and then click Finish to complete the import. A message will indicate whether the import was successful.

When importing RDF into a Working Copy, the addition of each triple can be added as an entry in the change history, where it will be available to all the relevant reports. When importing into a production copy, the Record each new triple in change history checkbox gives you the option of adding these to the change history; note that this is not recommended when importing large amounts of data.

Import File using Script (Customization)

EDG can also incorporate custom scripts for importing arbitrary types of text files, including XML, JSON or spreadsheet files. Each such importer must be set up by a power user or Administrator based on a SPARQLMotion script. Once set up, the custom importers would show up on the Import view as shown in the following screenshot:

The common requirement of these script-based importers is that they take a single text file as input and output new RDF data. Script-based importers can be activated per model type (e.g. only for Ontologies) or even for only one specific model instance. They are a powerful mechanism to simplify repeatable tasks for end users.

This paragraph assumes the reader to be familiar with SPARQLMotion. In order to create such scripts, create a new RDF/SPARQLMotion file with TBC-ME. The file must end with .sms.ttl. Into that file, import the namespace http://topbraid.org/teamworkscripts (from /teamwork.topbraidlive.org/system/teamworkscripts.ttl). Also import the XYprojects.ui.ttlx file for the vocabulary type that you want the script to be activated for. For example, if you want to add a script for Ontology vocabularies, add an import to ontologyprojects.ui.ttlx. Next, use Scripts > Create SPARQLMotion Function/Web Service to create a new service. This service must take a single argument of type xsd:string and return a module of type sml:ReturnRDF. This argument will contain the text content that has been uploaded, e.g. the data from an XML or CVS file. The script may access the currently active target graph (vocabulary) using sml:ImportCurrentRDF. The script needs to produce a graph of new triples that shall be added to the target graph. At the web service instance that you have created (instance of sm:Function), use the property teamwork:suitableProjectType to specify the vocabulary type(s) that the script should show up for, e.g. ontologyprojects:ProjectType. Alternatively, use teamworkscripts:suitableVocabulary to link the URI of individual vocabularies, starting with urn:x-evn-master: .

Import Single Document

This allows manually importing an external file in the corpus, rather than going through a Connector or importing an existing RDF representation of the Corpus. This will show a screen where the Browse... button opens a dialog for picking a source file. Its text and metadata will be parsed by the Tika content analysis toolkit, which can handle these supported formats. The Show Imported data button on the next screen allows reviewing retrieved information. Most supported file formats will present three sections:

  1. common Metadata Properties such as file name, media type, title, creator; 
  2. Content, which is the actual document's text (where applicable);
  3. Other Properties, which include various ones the importer was unable to label and are therefore referred to with their URIs.

Import/Export View: Export...

Export Corpus as a Graph

Any viewer of a collection can export its data in a standard RDF serialization format.

Creation of reports and exporting of data are available when working with both production and working copies of reference data to anyone with who has read access.

To export subsets of Corpus data according to custom criteria and sorting, note that EDG's Search screen provides fine-grained control over the data to display on the Search Results area. That form's gear menu offers several choices to export the results into spreadsheet-compatible formats (e.g., for Excel). See Corpus View or Edit for details on searching.

Select JSON-LD, N-TriplesRDF/XML, or Turtle under the Export Corpus as a Graph header of a production or working copy's Export... header to generate an RDF file representation of the reference data using one of these formats. Different browsers may display the result different ways, or perhaps not display anything at all in the standard browse window; selecting your browser's equivalent of the View Source command will display the actual RDF data. When viewing the source, you can also pick Save As or Save Page As from your browser's File menu to save the RDF file as a disk file.

Instead of clicking one of these three links, an alternative is to right-click it and then pick Save Target As or Save Link As, depending on your browser, to save the RDF representation of the data to a local file. A dialog box will then prompt you for the name and location of the file.

SPARQL Endpoint

This allows users of the collection to run new or saved SPARQL queries on it and to optionally save queries for others. Saved queries can be deleted by their creators and by collection managers. If SPARQL updates have been enabled by an administrator, editors (and managers) can run them, but viewers cannot. Note that the Pivot Table and Geo functions can be slow on some platforms and are not supported for Internet Explorer.

Saved SPARQL Queries

This lists the SPARQL queries that have been saved for the collection. For each query, it provides a URL that will run it, along with an Export Query button that runs it and shows the results.

Export Saved Search

The Saved Search link shows a screen listing your saved searches. These are searches that you have saved using the Save current search button in the search form of the Editor page.

   

After setting the Result Format for a given search, clicking its Export button will download the search results in that format. Your saved searches are web services. They can also be used as an APIs by other systems.

Reports View

Anyone with read access to a production Corpus or working copy can generate various standard reports for it. Custom reports are also possible.

Problems and Suggestions Report

For any Corpus (production copy or working copy managed by a workflow), Reports > Problems and Suggestions checks the current state of the Corpus against all of its applicable quality rules (i.e., its shapes and validity constraints they define) and enrichment rules. A message box shows the rule-processing progress and then shows the report. Note that the report results are also reflected in the Dashboard > Completeness and Validity display.

Users can also enable validity checking when they are viewing individual resources in the form. This setting the applies across all of asset collections user works with. See the View or Edit documentation for details.

To develop custom extensions to this feature, see EDG Developer Guide > Extending the Problems and Suggestions Reports.

View Shapes and Constraints

This link lists all of the SHACL shapes and constraints that are currently applicable for the given Corpus. Editors of the Corpus can individually disable them (cf. internally using sh:deactivated=true). They will then be disabled for the asset collection you made this change for and for any asset collections that include it. To disable them more globally e.g., for all the Corpora, use Ontology modeling features of EDG.

Note that shapes not only define rules about valid values for the properties, they also specify that a given property is available for a given type of asset. When you use View Shapes and Constraints page to disable them, you are disabling the field. Ontology modeling features of EDG give you finer control. You can keep the field but disable some of the constraints defined for it.

View Change History

Click Reports > View Change History to show the Change History view. For a production copy, this shows all the changes made since it has been created. For a workflow, this shows only changes made within the working copy managed by the workflow.

Clicking the Search button on the Change History screen displays a time-stamped list of the saves made in the Matching Changes panel, and clicking one of those lines displays details about what changes were made as part of that save operation in the Details of Selected Change panel. Below, the change made on July 30th has just been clicked, showing that three values were added and one was deleted as part of the change made with a particular save operation.

If you are logged in as a user who is editor or manager of the vocabulary/asset collection or a workflow where the change was made, then a link Revert this Change will appear in the bottom panel. Click on this link to undo this operation. This will in fact create a new "forward" edit in the change history, with yourself as author. Note that this feature should be used with care, because reverting some steps from the middle of the change history may lead to orphan resources in your model.

If you are logged in as a user who is editor or manager of an asset collection and look at a change performed in a working copy as part of a workflow, then a link Commit this Change to production will appear in the bottom panel. You can click on this link to move the change history entry (in the example above, the three additions and the deletion) out of the workflow copy and into the production copy, essentially cherry-picking which change from a workflow copy you want to accept. As with the Revert feature mentioned above, this feature should be used with care, because committing some steps from the middle of the change history may lead to creating data statements that are disconnected from the rest of the information. For example, when you commit a change that has modified some attribute of a newly created code, then you should also make sure that the change that created the code in the first place has also been committed.

Before you click the Search button, you can narrow the scope of the search by filling out any or all of the fields at the top of the form:

  • creator Enter the name of a particular EDG user to only see changes by that user. This field uses typeahead, so that if you have users named "Joe" and "Joan" and only type in "Jo", these two names will appear in a drop-down list for you to pick from.

  • date Enter a date in the first date field to see all changes after that date, a date in the second field to see all changes before that date, or in both fields to see the changes within a particular date range. (There's no need to actually type in the date value; clicking in either field displays a calendar where you can then click on the date you want to enter.)

  • status Enter "committed" or "uncommitted" to only list changes with one of these status values.

Comparison Report

For a production Corpus, this report shows its differences with another, user-selected Corpus. For a working copy, it shows the differences to its parent production version. Note that differences do not extend to the contents of included asset collections. The report will list each changed assets and properties that were changed, showing the changed values. If a value was added, it is shown in green. If it was deleted, it is shown in pink.

For example, the following shows what happens after the preferred label property for "South Korea" is edited, an alternative label is added, and a "Seoul" is added as a narrower value of the "South Korea" resource (renamed to "Republic of Korea").

The right hand side of each change contains a link View Change that displays a dialog box with details of the change log entry that caused that particular change. Depending on your permissions, you can revert or commit the change in that dialog box. See View Change History for further information on reverting and committing individual changes.

Statistics

For any Corpus (production or working copy), Reports > Graph Statistics displays details about the Corpus's node distribution. The following shows the statistics for the sample Reference Dataset: Country Codes.

Corpus contents

For any Corpus with a connected data source (production or working copy), the Reports > Corpus contents action lists all documents that were either manually imported or retrieved from a remote location with a connector.

Each line in the table represents a single document, identified with its URL to the original document no matter it being a web page or downloadable file, its media type, the date of the last time it was downloaded from its remote location to the EDG cache, and a hyperlink shown as a page icon to download this cached copy.

Note this report is not available for Corpora configured with No Connector. The Corpus editor can be used instead.

Workflows View

This view allows users to start workflows, and it lists both the in-progress and completed workflows, if any.

Start new Workflow

This button opens a form for starting a workflow. The new workflow requires a name and allows you to enter an optional description, both of which remain editable by managers. For more information, see Workflow Overview and related pages.

Workflows in Progress

If there are any in progress workflows for this collection, they are listed in the Workflows in Progress table. To access a particular workflow, select its row and then to open its utilities view click on Go to Workflow. You will see a page showing you the status of the workflow and, depending on the workflow status and your role, allowing you to move the workflow to the next state.

All changes to be processed by a given workflow are stored in a working copy associated with it. A workflow can be used to process changes to multiple assets or changes to one specific asset. To view, modify existing information or enter new information click on Go to Working Copy. You will see and, if you have the necessary permissions, be able to modify any information. Changes you will make will be local to the workflow until it completes successfully and changes are committed to the production copy of the asset collection. If the workflow was created for a specific asset, name of the asset will appear in the row and the Go to Asset button will be clickable. If you click on the button, you will be presented with the form displaying information about this asset.

The contents of the Workflows in Progress table can be printed and exported in a spreadsheet format.

Completed Workflows

This table works similarly to Workflows in Progress except that it lists the workflows that reached the terminal state. Typically, this means that changes have been finalized and committed to the asset collection. Users can view the history of workflow transitions. Each completed workflow shows its number of changed statements (triples), giving users information about the volume of changes made as part of the workflow. For completed workflows with extensive changes, preserving such history of changed triples might occupy considerable space. Therefore, asset collection managers can select a completed workflow and use the Archive action to remove the audit trail from the change history. The change records are copied into a file that an administrator can access if the change history details are ever needed again.

Tasks View

The Tasks feature allows users to associate tasks with asset collection resources. If the Tasks item is not listed in asset collections' main utility (operations) view, then see EDG Configuration Parameters for how an administrator can enable the Tasks activated configuration parameter.

Tasks for [Corpus NAME]

When this feature is active, tasks can be associated with either a top-level asset collection or with a resource it contains, such as a class, property, or individual code. The Tasks management view of an asset collection shows the tasks associated with it at any level. At the bottom of the view, the Create Task link displays a dialog box where you can add a new task's description and user assignment. Once you click this dialog box's OK button, EDG adds the task to the list for this data asset, where you can reset its status or who it's assigned to with that form's drop-down lists. You can then filter the list display by these values.

When editing a particular asset resource within a collection, the Tasks button on its details edit form allows viewing and creating tasks about the resource. The button also indicates the number of tasks assigned to the resource.

Newly created tasks are, by default, assigned to the manager of the asset collection, who can then reassign tasks to other users. A user assigned to a task can change its status and enter comments about tasks.

Administrators can activate a feature to Send task emails in the EDG Configuration Parameters. When activated, users with an email address (e.g. via LDAP) will receive emails whenever a task gets assigned to them, or if a property of an assigned task has changed.

Comments View

The Comments feature allows users to associate comments with asset collections and asset resources. If the Comments item is not listed in asset collections' main utility (operations) view, then see EDG Configuration Parameters for how an administrator can enable the Comments activated configuration parameter.

Recent Comments

When viewing or editing a resource such as a class, instance, or taxonomy concept, the Comments button in the lower-right shows how many comments have been added to the selected resource for this production or workflow copy. Clicking the button displays a dialog box where you can see previous comments and add your own under the "Add Comment" title; click the OK button when you are finished.

Comments have a status such as "open," "declined" or "resolved." The status of those can be changed using a drop-down list to the right of each comment entry. If you also have the TopBraid Explorer (Viewer) application, the display can also include comments from those viewers, marked with (via TopBraid Explorer).

To get a list of of the most recent 100 comments for a production or workflow copy, select its Comments management view. These comments can be filtered by status, for example, to only display the "open" comments.


When resources such as concepts, classes, or instances are deleted, their comments are not automatically deleted with them. These are known as "orphan comments." If there are any orphan comments associated with a given asset collection, the Comments view will include a hypertext link saying "Delete the X orphan comments about entities that no longer exist," where X is the number of orphan comments associated with this asset collection. Clicking this link will delete these comments.

Manage View

Each collection's Manage view is only available to its managers.

Create a Cloned Version

Managers of a particular Corpus can use the Create a Cloned Version function to create one or more named clones of the Corpus. A new clone will have the same content and user roles as the original production instance. However, neither the change history nor the working copies will be cloned.

Cloning is often used to "branch off" a version of the Corpus, so that it can be referenced and imported separately from the current version. For example, one could start with a Corpus called "People." Then, on reaching a milestone, one could create a clone and call it "People 1.0." Now, any other Corpus that explicitly should only use terms from version 1.0 could change its includes to that version only, while the ongoing work towards version 2.0 will continue on the main "People" Corpus.

Clear

Managers can Clear a particular Corpus, which deletes all of its content, history, working copies, comments, and tasks. The empty Corpus itself and its user roles will be preserved. This feature can be used prior to file imports, to replace the whole content with an externally generated version.

Delete

Managers can delete a Corpus via its   Delete  link, which raises a message box to confirm the deletion. Clicking  OK  will delete the Corpus production instance and any working copies and history data.

A deleted Corpus is not recoverable.

External Graph URI

This defines a base URI that automatically maps with EDG's internal Graph URI (see utilities > Settings) during imports and exports. A manager can either edit this manually or set it automatically by importing an RDF file when no other value exists for the external URI. Also, RDF file import automatically redirects any owl:imports statements to the local copies. Thus, a manager can create a new Ontology and then import an RDF file to pre-populate it correctly.

The inverse mapping happens when graphs are exported back to RDF files: their external Graph URI is used instead of the internal urn:x-evn-master:... URIs.

Configure Notifications

For each Corpus, EDG can send notification messages to users in selected roles when certain kinds of changes happen to it. The Manage > Configure Notifications link provides a page listing all available Notification Events together with check-boxes to select the governance roles that will be notified:

The association of users with the governance roles for this collection is configured via governance areas. The user settings can be specified directly as individual users or indirectly as either user security roles or job titles. See Governance Model Overview for a discussion.

JIRA Project Key

Note that this item only appears if an administrator has setup the EDG Administration: JIRA Integration Parameters.

JIRA is Atlassian's web application for team issue-tracking. EDG's JIRA launch-in-context (LiC) feature allows users who are working in both EDG and JIRA to launch from editing particular EDG asset items into related JIRA searches and new items.

If the EDG JIRA feature has been administratively setup, then each collection manager can optionally set a JIRA project key string for the asset collection, where the JIRA-key identifies a specific project in the JIRA application. Setting the project key then enables JIRA LiC functions for collection editors. When editing any asset item, editors can use its gear menu to create or search for related JIRA issues. See Corpus View or Edit – Manage > JIRA Launch-in-Context for details.

Setting the project key also adds a JIRA link to the collection's utilily view header, which launches into JIRA to show the configured project's open items.

Refresh All Documents

This action will retrieve all configured pages, in a background process. This process will run until it is done harvesting the target data source or until the EDG server is stopped.

The action is not available for Corpora configured with No Connector.

Check for Updates

This action will retrieve only pages identified as new or changed since their last retrieval, in a background process. This process will run until it is done harvesting the target data source or until the EDG server is stopped.

The action is not available for Corpora configured with No Connector.

Corpus-CONNECTOR-TYPE Configuration

Any manager of this Corpus can modify parameters specified during its creation process, as external data sources can be on remote networks not necessarily under the creator's control and connectors should reflect these changes. Hence, depending on the connector type, different settings are required by different forms appropriately documented in the user interface.

While the connector's parameters can be edited, the Corpus' type of data source (one of the four options) itself cannot.

This link to connector-specific configuration panel is not available for Corpora configured with No Connector.

SPIN Constraint Libraries (data quality rules)

This link lets managers select which of the available constraint libraries to apply to the given Corpus. Each library contains data quality rules that are checked through various operations, such as editing and the collection's Reports (whose results, in turn, may be reflected in the collection's Dashboards). The library selections apply to each collection separately.

Note that activating a large number of constraint rules might affect the performance of interactive editing on large collections. In such situations, one might consider deactivating some constraint libraries during editing and then re-activating them before running Reports.

  • No labels