Skip to end of metadata
Go to start of metadata

Page Contents

Introduction

Reference data are standardized codes or data entities that are typically used by multiple applications as lists or tables. In fact, they are often called "code tables." An individual code table may seem like a simple thing, but a well-managed collection of code tables and related reference data spread across an enterprise is a resource that can bring great value to that enterprise—or cause great problems if it is not well maintained. EDG lets you control your reference data so that you can put it to work for you as efficiently as possible.

For additional information on reference datasets see:

This document is organized by roles showing how:

  • Reference Data Stewards can create and modify enterprise  reference datasets  and  ontologies, import reference data and manage information about it.

  • Data Stewards will typically create reference datasets that reflect reference data in sources they are responsible for. They will then align these with the enterprise reference datasets for the same entity.
  • Data Managers can export and provision reference data for use in their applications.

  • Business Analysts and other users can consult EDG to learn more about codes and code sets important to their work.

Accessing the EDG Application

To work through this guide, use a browser to access the EDG web-application running in one of the following environments.

  1. Create an EDG trial evaluation at TopQuadrant, and run EDG from the TopQuadrant servers. Submit an EDG evaluation request or contact TopQuadrant.

  2. Use TopBraid Composer - Maestro Edition (TBC-ME), and run its demonstration version of EDG. Download and install TBC-ME.

  3. Install EDG on a server accessible to your network (which could also be a local Tomcat server, via localhost). For a custom install, contact TopQuadrant and see EDG Server Installation and Integration. This will also require separate uploading of the EDG samples project: sample.teamwork.topbraidlive.org, from TBC-ME.

For the TBC-ME option, launch TBC-ME and then start the demo version of EDG via the top menu: TopBraid Applications > Open TopBraid EDG. Browse to  http://localhost:8083/edgLogging in as Administrator requires no password for the demo version. All asset collection types are available in the demo version.

For the other two options, the system administrator or TopQuadrant will provide you with a URL, a username, and a password. Browse to the URL and log in. Server licensing will determine the availability of the various asset collection types.

The EDG User Interface

For a basic orientation to the user interface, see EDG User Guide - Overview.

Reference Data Management

Getting Started for the Reference Data Steward

Defining the Structure for Reference Dataset

Each reference dataset designates some ontology class as the dataset's main entity, which defines the type of the dataset's reference instances, i.e., the individual code items. Maintaining ontologies as a separate enterprise-wide collections is a best practice that facilitates reuse of concepts (classes and associated properties) and supports connectivity between reference datasets that may need to refer to each other. For example, Market Identifier Codes dataset (provided as a sample in TBC-ME) uses country codes to define the geographic location of markets. Instead of re-defining country codes, MIC codes reference the ISO country code managed in their own dataset. Hence any change to the country codes is made in one place, the ISO Country Code dataset, and reflected in all datasets that use it, such as MIC. This behavior is enabled by creating an enterprise ontology that specifies a relationship between market identifier codes and country codes.

Ontologies describe business entities, including entities for which you will govern reference data (codes). Ontologies can be thought of as a powerful flexible representation of business glossaries. An ontology may contain a class (entity) such as country, product type, industry and so on. Each of these entities can have different fields (properties) making it easy to support different types of reference data. Reference datasets in EDG are not limited to having only a handful of predefined fields such as a code and a description. They can have any property you may need to capture. For example, a reference dataset for country codes may have properties such as the various ISO codes, capital, gross national product, and language.

In order to create reference data, we need to first define the corresponding entity and its properties in an ontology.

Select the Ontologies item to show the list of user-accessible ontologies along with some metadata about each one. EDG supports the definition of a single enterprise ontology or a set of individual ontologies (for example, per department or business area) which can be combined with one another using an "includes" mechanism.

This tutorial uses the ontology: Enterprise Ontology - Example, which is included with TopBraid Composer ME. If it is not present in your TopBraid EDG server application, an administrator can upload its entire TBC-ME project using either (1) TBC-ME project:   sample.teamwork.topbraidlive.org  > Export... > TopBraid Composer > Deploy Project to TopBraid Live Server or (2) manually zipping that same TBC-ME project into file:   sample.teamwork.topbraidlive.org.zip and using the EDG application server > Server Administration > Project Upload and selecting the zipped project file.

For information on creating a new ontology, see Create New Ontology in the User Guide. If you have access to the Enterprise Ontology - Example collection, its title will show as a hypertext link. (If not, an EDG administrator can upload its project if necessary and make it accessible to you.) Selecting the ontology's name link shows its utilities view, which provides many utility-function subgroups, such as Settings, Import/Export, Manage, etc.. There is a link in the header to edit the production copy (or just to view it, depending on the user's authorization). Viewers of the production collection can create child Workflows (via its utility view), which allow editing of isolated working copy (extensions) of the production state. See Workflow Overview for details.

click to enlarge

Click on Edit Production Copy to start viewing and editing the ontology's content.

The ontology editor's left pane shows a Class Hierarchy with classes (entities) and their properties (both attribute/datatype and relationship/object properties) shown as nodes in a tree. Because the Enterprise Ontology - Example is set to allow instances (objects) in addition to the classes, a left-lower panel shows the instances, if any, for each class.

click to enlarge

The coloured buttons at the top of the class hierarchy, next to the quick search field, will create a new class , attribute property , relationship property , or property shape to the selected class in the hierarchy. As shown in the screenshot above, clicking on a node in the tree (such as a class Market Identifier Code), displays information about it in the form to the right of the tree.

The Edit button lets opens an editor to modify information displayed in the form. Deleting the currently displayed resource is accomplished by choosing "Delete..." in the gear wheel icon  to the left of the Edit button. It also provides access to several other operations such as the creation of custom forms or the designation of a property as a primary key for a class.

Later in this tutorial a reference dataset of airport codes will be created and populated with data from a spreadsheet. The following fragment is a sample of this spreadsheet:

Airport

City

Country

Country Code

IATA Code

Latitude

Longitude

Keflavik International Airport

Keflavik

Iceland

IS

KEF

63.985

-22.605556

Patreksfjordur

Patreksfjordur

Iceland

IS

PFJ

65.55583

-23.965

Reykjavik

Reykjavik

Iceland

IS

RKV

64.13

-21.940556

Siglufjordur

Siglufjordur

Iceland

IS

SIJ

66.13333

-18.916667

Vestmannaeyjar

Vestmannaeyjar

Iceland

IS

VEY

63.4243

-20.278875

Sault Ste Marie

Sault Sainte Marie

Canada

CA

YAM

46.485

-84.509445

Winnipeg St Andrews

Winnipeg

Canada

CA

YAV

50.05639

-97.0325

Shearwater

Halifax

Canada

CA

YAW

44.63972

-63.499444

St Anthony

St. Anthony

Canada

CA

YAY

51.39194

-56.083056

To add model support for this information, create or extend an ontology to identify the main entity (class) and the properties (attributes and relationships) to be used in the new dataset. Create a class named 'Airport' that will be used as the main entity in the reference dataset. To do this, select the top-level class named 'Thing' in the class hierarchy, click the yellow button under the hierarchy, enter the name "Airport" and click OK. The edit dialog box will appear in the middle pane. Add a comment, if desired, and click the Save Changes button at the top of the pane. Create attributes for Airport by selecting the Airport class and selecting the green icon for Create Attribute (Datatype Property) displayed at the bottom of the Class Hierarchy. Select the Airports class each time to create the following attributes:

Attribute Name (Label)

Description (Comment)

Range of Values

airport city

Main city served by airport. May be spelled differently from the airport's name.

string

IATA airport code

An IATA airport code, also known an IATA location identifier, IATA station code or simply a location identifier, is a three-letter code designating many airports around the world, defined by the International Air Transport Association (IATA).

string

latitude

A horizontal position of a location on the Earth according to a geographical coordinate system in decimal degrees, usually to six significant digits. Positive latitude is above the equator (North), and negative latitude is below the equator (South).

float

longitude

A vertical position of a location on the Earth according to a geographical coordinate system in decimal degrees, usually to six significant digits. Positive longitude is East of the prime meridian, and negative latitude is West of the prime meridian.

float

Note that an attribute for the airport name has not been created. This is because each entity has a built-in attribute "label" which is intended to hold its name.

Alternatively to manually entering classes and properties, you can use Import>Import Schema from Spreadsheet to automatically create them from the first rowm of the spreadshet and then adjust as necessary.

TopBraid EDG will always create a globally unique resource identifier, a URI for each entry in a reference dataset. To enable this, you need to select the field which values would be used in the URI creation. The selected field is declared to be a primary key for the entity. Therefore, the field used as a primary key must always have unique values for a given class of codes. Typically the code field (or one of the code fields) will be used as the key.

You can set the primary key directly in the ontology (as it was done for countries) or set it locally in the reference dataset itself. We will use the latter method since we will create two reference dataset for the same entity that will use different properties as primary keys.

Next, click the blue button under the class hierarchy to create a relationship property named "airport country". In the comment field, describe it as "A country where an airport is located". Set its range of values to the Country class ( http://topbraid.org/schema/enterprise#Country ). (Failing to do this can cause problems when it's time to import data into the new reference dataset.) To do so, start typing "Count" in the range field and pick "Country" as it appears in the autocomplete.

Since the primary key for ISO Country is its two-character ISO country code and the spreadsheet contains this information, the EDG will be able to create a relationship between airports and countries as we import spreadsheet data. Note that we have not created a field for the country name; names of the countries are already maintained as part of the country codes, and therefore including names will redundantly add another country name.

In the next step a reference dataset will be created that will store reference data for the airports.

Creating Reference Datasets

Go back to the home page by clicking the TopBraid logo in the upper-left. The reference dataset that we're about to create will be associated with a particular "governance area", so before adding the dataset select the Governance Areas link located in the left menu under Governance Model section, click the Create Data Subject Area button, add a subject area with the label Logistics.


Now you're ready to create the dataset. (If you are not using a TopQuadrant evaluation account to work through this tutorial, then before your own reference data installation can let users create new reference datasets, it must be configured to work with a backend database for storage. See  Configuring the persistence technology for new vocabularies and assets  for further information.)

Choose Reference Dataset in the Choose type dropdown.

You will see the following page:

Enter Airports as the label (or name) of the dataset and for its description enter: Reference dataset of airports with IATA codes. The Ontology to Include option lists the ontologies that are available to the user, which in turn will provide the class for the dataset's main entity. In this case, select the Enterprise Ontology Example, which has the Airport class that you defined. Click Submit.

Setting the Main Entity

Ontology used for creating a reference dataset will typically contain several classes (entity types). After creating the reference dataset and before importing the airports data, you need to tell TopBraid EDG what reference data will be in the dataset. This is done by identifying the "main entity" for a dataset. In our example, it is Airport class. There are two ways to set the main entity initially.

  • If the main entity is unset, then editing the reference dataset will first require the main entity class to be selected. Edit the reference dataset and select the desired class, Airport , from the provided dropdown listing ontology's classes.
  • A reference dataset's main entity class can also be set or changed via the its utility: Settings > Metadata > Edit > main entity (class).

Make the Airport class the main entity via one of these two methods.

In our example, we have defined a field to be used as a primary key in the ontology itself. If we did not do this in the ontology, we would be offered a choice to identify the primary key property in the dataset itself. Since it is used for the URI construction, a primary key must be defined before any data is loaded into a dataset.

Documenting a Reference Dataset as an Enterprise Reference Dataset

Your organization may have several reference datasets in EDG that contain codes for a given entity. For example, you may have different existing applications and corresponding sources that already store and use airport codes. The goal of standing up a system for managing reference data is to achieve alignment across your existing reference data and to streamline its management. This alignment takes time. At least initially, you may have in addition to a "master" reference dataset that you want to be a definitive source of reference data for a given entity across all system, reference datasets that capture what each of your systems is using.

To differentiate between your master referrence dataset for airport codes and others "in situ" reference datasets, go to Settings > Metadata > Edit > is enterprise dataset. Set this flag to true and click Save.

If another reference dataset is created for the same entity, it could be mapped to the enterprise dataset using Crosswalks. TopBraid EDG can auto-create crosswalks between two datasets. It also offers crosswalk web services to translate between codes.

Setting in the Primary Key in the Reference Dataset

Click on the Edit button. TopBraid EDG will ask you to select a property to serve as a primary key.

Select the "IATA airport code" property. For the  Start of URIs  value you can accept the default value or override it. Click on Set primary key for this Dataset button.

You will now see a tabular display that lets you create reference data. However, instead of manually entering data, we will import it from the spreadsheet you downloaded earlier.

Importing Reference Data

On the Airports reference dataset, select utility: Import/Export > Import Spreadsheet using Pattern. Then click Choose File to select the spreadsheet. (Download the airports.xlsx spreadsheet to get a local copy to import.) This page has two more fields:

  • Sheet index: by default this is 1. This spreadsheet has only has one worksheet and therefore there is no need to edit it.

  • Entity type: a list of classes from the included ontology (the enterprise ontology) to indicate which one is being populated by the airport. Ensure that Airport is selected.

Clicking Next shows several potential patterns for spreadsheet data. Select No Hierarchy. (Note: Reference data supports managing hierarchies as well as flat lists.)

The next step is to map the spreadsheet columns to the properties of the Airport class as shown below, which maps the columns to the properties defined above and to the built-in "label" property. Note that Altitude column was not mapped by choice - only those columns chosen will be imported. The Country column was also not mapped because it contains country names that are already managed as part of the official ISO Country Codes dataset.

Click the Finish button. (The spreadsheet is over 5,700 rows and may take a minute to import.) After the dataset imports the data, from the home screen of your Airports dataset click on Edit Production Copy to view the reference dataset. The first time this operation is performed with a new dataset for which you have not setup the main entity, a dialog box will appear to identify the main entity of the reference dataset. Select Airport and click the Continue button.

A page appears containing a grid with the imported data, a details form to the right of the grid and a form hidden on the left (click a blue bar to see it) that will show details for each selected grid row and allow editing. Double-clicking a row displays its information in a pop-up form. All the panels are resizable.

The grid displays 25 rows at a time by default. This default can be changed by resetting the field at the top left corner of the grid as shown above.

The columns displayed in the grid can be modified by unchecking the boxes in the advanced search form and performing a search by clicking the Search button. To uncheck, click on a check sign and a pop-up menu will show different options to choose from. The Search Form screen can be accessed by clicking on the Advanced Search link in the right top corner of the search pane. Once you select the columns you want to see in the main pane, click Search and the table will reflect your choice.

To save the current configuration of columns as a default for all users, click on the "Save current search as default"  icon at the bottom of the Search form. (Note that all buttons have mouse-over text with an explanation of the button's purpose.)

Reference datasets can be organized in hierarchies as well as in flat lists. If a reference dataset contains hierarchical relationships between codes, these can be viewed and modified by clicking on the tree icon  to the left of the chosen Airport class in the page header.

Including other Reference Datasets

As shown in the first screen shot of the editor, the Airport Country column shows URIs of the countries and not their names or the code values. It happens, because the reference dataset describing the country codes was not added to the Airports dataset (or included in it). We can fix it by going back to a Settings menu for the Airport reference dataset and including the appropriate graph. Click on Includes. In a pop-up window select "Country Codes" to include it in the reference dataset. Instances of the Country class are now included in the Airports dataset by reference, meaning the data is not copied, but included.

Referencing other dataset in this manner maintains country reference data in one place. If a country is renamed, for example, Cape Verde, an island country in West Africa, is renamed to the Republic of Cabo Verde, the update needs to occur in only one place, the ISO Country datasets. All datasets that include ISO Country will se this change immediately. At the same time, you will have access to country names and all other information from any reference dataset that includes country codes. The names and other reference data for countries is stored in the Country Codes dataset.

Once the reference dataset for countries is included, it will look up the range 'airport country' property and automatically match the value to the primary key defined for ISO Country, the 'iso 3166 2-alpha country code' attribute. Click the Edit Production Copy link at the top of the page to return to the Editor. Note that Country names appear in the Airport Country column instead of URIs as before. These names come directly form ISO Country. Clicking on the link will provide other information about the country directly from the ISO Country dataset.

Click on an instance of the Airports dataset so it is displayed in the bottom pane. The airport country property is now populated with a label from the ISO Country dataset. In addition, the link will navigate to a form that displays information from ISO Country about the selected country. You can also click on an arrow shown directly after the country name to open a form with the information about the country in a new window. This way, you can look at an airport and its associated country side by side.

The class used to populate the grid can be switched from 'Airport' to other classes included in the Airports dataset by using he dropdown field to the left of the user name in the header, which currently has 'Airport' chosen. The Search form also changes to support searching for countries using their specific metadata fields.

Included data, such as the Country Codes data referenced by the Airports dataset, can be viewed and searched, but modifications to included data is not permitted. Included data can only be modified by editing the included referenced dataset directly. You will be able to edit only codes for the main entity - or one of its subclasses.

Managing Metadata for a Reference Dataset

The metadata settings of any asset collection can be viewed/edited in its utilities (home) > Settings view. Expanding its Dataset Status and Property Definition sections will show its metadata information as follows.

We have provided the main entity and the short description information earlier in this tutorial. In the form's Overview section, the related entity value is automatically derived by the reference dataset as any class (entity) that is connected to the main class. Country appears because this is the main class for the Country Codes dataset now included. The last updated field is also automatically recorded.

When a dataset is first created, the status is automatically set to "Under development". This can be edited to update it when the status of the dataset changed.

TopBraid EDG is shipped with some predefined status values. They are configurable if your organization needs a different set of values.

The Property Definitions section shows the description of each property of the main class as it was entered in the ontology. You can also add additional descriptions local to the dataset; these will not be part of the enterprise ontology.

Click the Edit button to see more available fields. Set is external dataset to "true" in the Dataset Status section of the form–IATA codes are maintained by the IATA Association, which publishes updates bi-annually. Change the status code to Approved. Click Save changes at the top of the Metadata section.

Once the status of a reference dataset is approved for use, you will no longer be able to delete codes from the dataset, but you will be able to change information about them.

Creating Reference Data Facts

In the Metadata tab, expand the Reference Dataset Facts section and enter the following "fact":

IATA codes should not be confused with the FAA identifiers of US airports. Most FAA identifiers agree with the corresponding IATA codes, but some do not, such as Saipan whose FAA identifier is GSN and its IATA code SPN, and some coincide with IATA codes of non-US airports.

Note that the text area displayed allows rich text, including hyperlinks. The links above can be replaced by choosing the text to be hyperlinked, such as "Saipan", and click the chain link in the icon box. Add the hyperlink to the text box that appears.

Click on the plus + icon to the left of the fact field name to add an additional entry and enter this additional fact there:

Since "Q" is used for international communications, IATA airport codes never begin with "Q".

Save your changes. The fact is now part of the metadata for the dataset and can be referenced, searched, etc.

You can define facts at a dataset level and also specify them for a given code in the reference dataset. If you want to do the latter, you need to include in your reference dataset a pre-built Reference Data Facts ontology. Your EDG administrator can also specify this inclusion as a system-wide setting for all reference data.

Setting the Dataset type to external

You may want to differentiate private (internal) reference data from public (external) such as ISO country codes.

In the Metadata section of the Settings tab click the Edit button again. You will see a new section on the form called Subscription; this is used to capture subscription-related information for external reference datasets. Add "IATA Association" to the "sourced from" field. You will only need to type the first few letters of its name, because the reference data knows that only one defined organization begins with those letters.Click the Save Changes button.

For additional information, see Reference Dataset Utilities - Settings > Metadata.

TopBraid EDG is shipped with predefined metadata fields for reference datasets. They are configurable if your organization needs different metadata. EDG is a semantic, model-based solution. Configuration is done using steps similar to those used to modify ontology models to accommodate new reference data.

Assigning Access Privileges to other Users

For any collection, including reference datasets, a user can have one of the following permission roles (see Asset Collection Permissions for more information):

  • Manager A Manager has the most capabilities. In addition to all the tasks that an Editor can perform, a Manager can delete an entire dataset, they can change the default columnar view for all users and they control the access privileges that other users have over a particular dataset by assigning Manager, Editor, or Viewer roles to them. They can also reassign and change the status of all tasks, even those that are not assigned to them. A person who creates a vocabulary automatically becomes its Manager.
  • Editor In addition to being able to perform all the tasks that a Viewer can perform, an Editor can make changes to the dataset's metadata and to the reference data itself. When a production copy has a working copy associated with it, an Editor can also publish or reject that working copy's set of changes to the production copy.
  • Viewer A Viewer can browse a dataset, viewing all the reference data (as well as any change history associated with that data) and the metadata associated with a dataset. A Viewer can create saved searches, export data, and record information about the usage of a reference dataset. They can create and view tasks, add comments and change status of a task assigned to them. A viewer can also create a new working copy of the dataset. The Viewer then becomes the Manager of the new working copy, but for those changes to affect the production copy they must be approved by a production dataset's Editor.

To give others access to the dataset, go to the User Roles tab on the dataset's home page.

Permission levels can be set for (1) individual users, (2) user security roles (e.g., from Tomcat or LDAP), The list of users you will see on this tab can include individual users and LDAP roles. A Manager can assign Manager, Editor and Viewer privileges to each user or user group. User Roles page is also used to set up  governance roles (as defined in the Governance model) for individual reference datasets. Governance roles can also be defined at business area or data subject area a reference dataset is associated with.

Governance roles provide an alternative approach to assigning permissions because if a user has a governance role for an asset collection, specified either directly or in directly for a subject area, they will automatically get Viewer permission.

Modifying Reference Data

Dominica's main airport, the Melville Hall Airport, was just renamed to the Douglas-Charles Airport in tribute to its late prime ministers, Rosie Douglas and Pierre Charles. While your next bi-annual update from the IATA Association will reflect this change, you need to make it ahead of receiving the update.

Click on Edit Production Copy from the dataset's home page. Search for Dominica by its code, DMA, using the search form's "airport country" field to get two airports in Dominica. Click on the Melville Hall to display its information, then click the Edit button at the bottom of the screen. When you make the change to rename its label value to "Douglas-Charles Airport", you can check the Enter log message before clicking the Save Changes button if you want to include a log message about your change.

To create a new airport, click on the New button in the button-row above the airports table in the central pane.

The EDG keeps a complete audit trail of all changes. Click the "Show History" check box to see the audit trail.

Export, Collaboration, and other Activities

Some of data stewards' tasks overlap with the tasks of other users. For example, stewards may build exports of reference data, but so do data managers. These overlapping activities, including collaboration between users working with reference data, are covered in the Getting Started Guides for Data Manager and Getting Started for the Business Analyst.

Creating a Crosswalk

Some systems may use a different local set of codes for the same entity - in our case, Airport. In these cases, you will want to map local, in-situ codes to the enterprise reference dataset for airports.

First, lets extend the ontology to create a new property for the class Airport, calling it "local airport code".

Now, create a new reference dataset. You can do this from the Governance Areas page as described previously. Or, alternatively, go to the EDG home page and click on the Reference Datasets located on the left navigation menu under Asset Collections. You will see a page listing all  Reference Datasets you have access to. This page includes a Create New Reference Dataset link. When dataset is created this way, it will not be associated with any governance area. You can add association to a governance area later by updating dataset's metadata under Settings .

Let's assume that it is a dataset used by a hypothetical Flight Tracker application and call it Flight Tracker Airport Codes. Include it in it Enterprise Ontology.

Click on Edit. When asked, set main entity to Airport and click on Continue. When asked, set the primary key to be local airport code. Adjust start of the URI as necessary.

Create a few New York area airports using data from the table below.

 

Airport

Local Code

La Guardia

1

JFK

2

Westchester County

3

Newark

4

Islip

5

 Create a new Crosswalk from the Flight Tracker Airports to the enterprise reference dataset Airports as shown in the image below. Click Finish.

You can now map two sets of airport codes manually or automatically. TopBraid EDG supports many to many mappings. Click on Edit to view the crosswalk. Initially, it has no mappings. To map manually, position your cursor in a row in the target dataset (yellow background) and start typing the name of an airport.

Autocomplete list will appear. Select your choice from the dropdown button and click on the green + button to create the mapping. You can also add a note to describe the mapping if desired.

To auto-map select utilities > Reports > Problems and Suggestions.

TopBraid EDG will generate some suggested mappings for you based on the airport names. Move the confidence level to 30% in the slider to filter out unlikely suggestions.

You can now accept suggestions one by one or move the confidence level even higher to let's say 70%, accept all top suggestions and then individually pick any lower confidence suggestions you want to apply. From the generated list, we want to accept La Guardia mapping, Newark Liberty mapping and Westchester Co mapping. The official name of the Islip airport on Long Island is Long Island MacArthur, so it was not found. Add this mapping manually. Your crosswalk should now look as follows:


To see more information about the mapped airports including their IATA codes, you can double click on a row. The form will open in a separate window. For more on working with crosswalks see Crosswalk User Guide pages.

Documenting the Use of a Reference Dataset

If you are using TopBraid EDG for Metadata Management or TopBraid EDG for Business Glossaries together with TopBraid EDG-RD, you can document the use of a reference dataset in your applications catalog, data assets catalog and/or business glossary. See relevant User Guides for more details.

Getting Started for the Data Manager

While this section can serve as a standalone tutorial, it assumes that the Airports reference dataset described in the Getting Started for the Data Steward section has already been created.

Defining Reference Data Export

As a data manager, you may need to distribute reference data for use in your data source. Export is one way of doing this. Reference data can be exported in full or as subsets of data defined through search criteria. After finding the reference dataset you need (see Locating a Reference Dataset for information on locating a reference dataset), click the dataset's Export tab to view the available exports. (Examples in this section use the Airports reference dataset.)

This tab includes an option to export all information available in a dataset. There may also be exports that focus on specific subsets of data; these are accessible from the Export Saved Search link. If there is no export that suits you, click the View/Edit Production Copy link and save a new search via the search form, which is the left-pane that toggles open/closed. Assuming that your application doesn't need latitude and longitude information, deselect these from the search form. Start typing "US" in the airport country field and pick "USA" from the autocomplete. Click on the Search button and results will appear in the grid. Different export formats can be chosen by clicking on the gear wheel icon  at the bottom of the search form. Export formats include TSV, XML, and JSON.

If these results fit your needs and you expect to pull this data from the dataset on a periodic basis, save the search by clicking the Save current search... button at the bottom of the search form and giving your search a name such as "US airports". Saved searches are web services that you can use to automate distribution of reference data.

Click the Show saved searches... button  to display a list of saved searches. Selecting one and clicking the Select button under the list will fill out the Search Form as specified by the selected search, and you can then click the Search button to re-run the saved search.

When selecting a saved search from the list, note that above the Select button is a URL that can be used as a RESTful web service call to invoke the search.

See  Search Within a Reference Dataset  for more information on using the Search form.

Viewing Saved Searches

Go back to the dataset's home page and select the Export tab if it is not already selected, then click Export Saved Search.

TopBraid EDG remembers the last tab visited on a home page of a reference dataset. When you come back to the home page, it will automatically select the most recently viewed tab.

Export Saved Searches will list all searches saved, including a description of the fields they include. The format for exporting the saved search can be selected by using the the Result Format field. Formats include CSV/TSV XML and JSON.

Click the Export button with Text/CSV result format selected to get a comma-separate value file that you can either save or open in Excel.

The URL of the saved search is displayed in the Service URL field, including a unique id for the saved search. This URL can be copied as-is and included in any third-party application needing to extract the codes in the saved search. The format is set at text/csv, but can be manually modified to any of the export formats supported by Saved Search.

Saved searches can be deleted in the Search view for the dataset. Choose "Show saved searches...", select the search to remove and click Delete.


Using TopBraid EDG-RD Web Services

TopBraid EDG includes pre-built services for validating your locally stored reference data against the datasets managed by EDG. It also includes crosswalk services for translating from one set of codes to another set of mapped (crosswalked codes). See relevant Guides for more details on how to use these services..

Getting Started for the Business Analyst

Finding a Reference Dataset

While this section can serve as a standalone tutorial, it assumes that the Airports reference dataset described in the Getting Started for the Data Steward section has already been created.

When you click on Reference Datasets link in the left hand side navigator, you will see a list of reference datasets you have access to. This list can be long, especially in large organizations with lots of different reference data. To find a specific dataset, use Find Asset Collection.

If you know of collections (e.g., ontologies or reference datasets) in your EDG system that do not appear, you might not have the appropriate viewing or editing privileges for them. Each such collection requires a manager to provide access by setting you (or your security role) as a viewer, at least. See the collection type's User Roles utility documentation for details about these steps.

The Find Asset Collection search form lets you enter more specific search criteria against a number of preselected fields. You can use a combination of search criteria. For example, a status value of "Approved" and a related entity of "Country". You can also use the Search Any Text field to enter a text string that by default will be matched against any textual information contained with the reference dataset. (Your administrator may configure this feature to only search over a subset of text fields, such as the name and description of a reference dataset.) If you need to target your text search to only one specific field such as description, click the triangle button to the right of the field to get menu options, select "text contains" and enter "country". Otherwise, unlike Search Any Text which assumes partial match, it will try to find an exact match—for example, a description that contains only the word "country".

You can use the same approach to find crosswalks. Alternatively, if you find the reference dataset you are interested in, such as enterprise dataset for Airports, you can click on Settings on a reference dataset's view to see all crosswalks it participates in and navigate to the crosswalk of interest.

Finding a Code

To find a specific code, go to Find Code in the left hand side navigator.

You can also use Search the EDG facility as described in the Getting Started with Business Glossaries.

Viewing Dataset's Metadata

The Settings tab > Metadata  on a reference dataset's utilities (main) view contains descriptive and contextual information about the dataset, grouped into sections. Note that empty sections might not be displayed until Metadata is placed into Edit mode.

Also, some "dependent" (variant) sections might not appear unless certain conditions apply, e.g., setting Dataset Status > is external dataset > true, identifies a reference dataset as external/public which makes available the Subscription section with associated fields describing the source of public reference data and how and when it gets updated.. The Property Definition (Semantic Analysis) section has a field-by-field description of each field in the dataset's main entity class.

Using Reference Dataset and Data Facts

As a Business Analyst you may have a report that needs to include a data feed that uses FAA airport identifiers. Reviewing the data, the FAA identifiers in the data seem to match the IATA codes, but you want to double-check that this would correctly integrate with the rest of your data which uses IATA codes.

Go to the home page of the Airports dataset, click the Metadata tab if it is not already displayed, and expand Reference Dataset Facts section. You will learn that while many FAA identifiers are identical with IATA codes, there are also differences. Assuming that these are the same codes would have let to errors in the integrated reports. To correctly integrate data, you should request a steward to build the crosswalk between the two sets of codes.

Creating Tasks and asking Questions about a Code

You may want to ask the reference data governance team to add FAA identifiers to the reference dataset because you believe this information will be useful not only for your immediate task, but for other applications and, should therefore be managed with the rest of the reference data.

Reference datasets let you log requests and questions in a form of Tasks. Tasks can be associated with an individual code or with the entire dataset. To create a task for the entire dataset, go to the Tasks tab on the dataset's home page, click the Create Task link and enter:

"Most of my data is coded with IATA codes, but I am starting to integrate new data feeds that use FAA identifiers. Please expand the dataset to include FAA identifiers."

You can select which user to assign the task to. By default it will be assigned to the dataset's manager. Click the OK button.

The task is now displayed in the tab. Tasks can be filtered by assignee and by status. Clicking on a task opens it in a popup dialog that lets users post responses or ask for additional information by adding Comments.

To create a task for a specific airport, click View Production Copy, navigate to the code you want to associate a task with and click on the Task icon at the bottom of the form to show detailed information about a code. Once a task is created, the number (0) in the icon will change to reflect a number of outstanding tasks.

Next Steps

You are now ready to explore the EDG User Guide - Overview to learn more about the many capabilities of TopBraid EDG, including workflows for team collaboration, importing more complex spreadsheets, and more.

  • No labels