Skip to end of metadata
Go to start of metadata

Page Contents

Introduction

Reference data are standardized codes or data entities that are typically used by multiple applications as lists or tables. In fact, they are often called "code tables." An individual code table may seem like a simple thing, but a well-managed collection of code tables and related reference data spread across an enterprise is a resource that can bring great value to that enterprise—or cause great problems if it is not well maintained. EDG lets you control your reference data so that you can put it to work for you as efficiently as possible.

For additional information on reference datasets see:

This document is organized by roles showing how:

  • Reference Data Stewards can create and modify enterprise  reference datasets  and  ontologies, import reference data and manage information about it.

  • Data Stewards can create reference datasets that reflect reference data in sources they are responsible for. They can then use crosswalks to align these with the enterprise reference datasets for the same entity.
  • Data Managers can export and provision reference data for use in their applications.

  • Business Analysts and other users can consult EDG to learn more about codes and code sets important to their work.

Accessing the EDG Application

To work through this guide, use a browser to access the EDG web-application running in one of the following environments.

  1. Create an EDG trial evaluation at TopQuadrant, and run EDG from the TopQuadrant servers. Submit an EDG evaluation request or contact TopQuadrant.

  2. Use TopBraid Composer - Maestro Edition (TBC-ME), and run its demonstration version of EDG. Download and install TBC-ME.

  3. Install EDG on a server accessible to your network (which could also be a local Tomcat server, via localhost). For a custom install, contact TopQuadrant and see EDG Server Installation and Integration. This will also require separate uploading of the EDG samples project: sample.teamwork.topbraidlive.org, from TBC-ME.

For the TBC-ME option, launch TBC-ME and then start the demo version of EDG via the top menu: TopBraid Applications > Open TopBraid EDG. Browse to  http://localhost:8083/edgLogging in as Administrator requires no password for the demo version. All asset collection types are available in the demo version.

For the other two options, the system administrator or TopQuadrant will provide you with a URL, a username, and a password. Browse to the URL and log in. Server licensing will determine the availability of the various asset collection types.

The EDG User Interface

For a basic orientation to the user interface, see EDG User Guide - UI Overview .

Reference Data Management

Getting Started for the Reference Data Steward

Defining the Structure for Reference Dataset

Each reference dataset designates some ontology class as the dataset's main entity, which defines the type of the dataset's reference instances, i.e., the individual code items.

Ontologies describe business entities, including entities for which you will govern reference data (codes). Ontologies can be thought of as a powerful flexible representation of business glossaries. An ontology may contain a class (entity) such as country, product catagory, industry and so on. Each of these entities can have different fields (properties) making it easy to support different types of reference data. Reference datasets in EDG are not limited to having only a handful of predefined fields such as a code and a description. They can have any property you may need to capture. For example, a reference dataset for country codes may have properties such as the various ISO codes, capital, gross national product, and language.

In order to create reference data, we need to first define the corresponding entity and its properties in an ontology.

Select the Ontologies in the left hand side navigation menu to see the list of ontologies you have access to. EDG supports the definition of a single enterprise ontology or a set of individual ontologies (for example, per department or business area) which can be combined with one another using an "includes" mechanism.

This tutorial uses the ontology: Enterprise Ontology - Example, which is included with TopBraid Composer ME. If it is not present in your TopBraid EDG server application, an administrator can upload its entire TBC-ME project using either (1) TBC-ME project:   sample.teamwork.topbraidlive.org  > Export... > TopBraid Composer > Deploy Project to TopBraid Live Server or (2) manually zipping that same TBC-ME project into file:   sample.teamwork.topbraidlive.org.zip and using the EDG application server > Server Administration > Project Upload and selecting the zipped project file.

TopBraid EDG ships with a number of sample ontologies and datasets. If you are using TBC-ME or an evaluation server, you will typically see the available samples. One of the sample ontologies is called Enterprise Ontology - Example. If you have access to it, its title will show as a hypertext link. In this tutorial, we will be extending this model with definitions necessary to support a new reference dataset. Alternatively, a new ontology can be created. For information on creating a new ontology, see Create New Ontology in the User Guide.

Click on the ontology's name to go to a page where you can perform various operations with an ontology - make changes to it, import data into it, exports it, etc.

Users that have edit privileges can make ad-hock changes to a given ontology or dataset. Otherwise, they must can follow a more formal process of modifying an ontology by using Workflows which will sandbox all changes into an isolated working copy until they are reviewed and approved. See Workflow Overview for details. In this tutorial we will make the change without using a workflow.

Click on the Ontology tab to start viewing and editing the ontology's content. You will see several panels. Left pane shows a Class Hierarchy with classes (entities) and their properties (both attribute/datatype and relationship/object properties) shown as nodes in a tree. Below the Class Hierarchy, you will see a panel that lets you create class members or instances. For example, if you select a class Country, you will be able to create countries. Best practice, however, is to keep schema and data separate. To facilitate this, you can click on the Manage tab and switch the ontology into No Instances mode. You will need Manager permission for the ontology in order to see Manager menu tab.

The colored buttons at the top of the class hierarchy, next to the quick search field, will create a new class , attribute property , relationship property , or property shape to the selected class in the hierarchy. As shown in the screenshot above, clicking on a node in the tree (such as a class Country), displays information about it in the form to the right of the tree.

To the right of this View/Edit form that displays information about any resource you select in the class tree, you may see a Search form (collapsed in the screenshot above). The search form panel is collapsable and expandable by clicking on the blue "candy stripe" pattern in the vertical divider to the right of the

The Edit button at the top of the View/Edit form, switches the form into edit mode, making all fields on the form editable. It may also display and let you edit fields that currently have no data and, thus, you will not see them in the view mode. Alternatively, you can edit values for each field in-line by clicking on the pencil icon that will appear when you position your mouse to the right of the field's name.

Later in this tutorial a reference dataset of airport codes will be created and populated with data from a spreadsheet. The following fragment is a sample of this spreadsheet:

Airport

City

Country

Country Code

IATA Code

Latitude

Longitude

Keflavik International Airport

Keflavik

Iceland

IS

KEF

63.985

-22.605556

Patreksfjordur

Patreksfjordur

Iceland

IS

PFJ

65.55583

-23.965

Reykjavik

Reykjavik

Iceland

IS

RKV

64.13

-21.940556

Siglufjordur

Siglufjordur

Iceland

IS

SIJ

66.13333

-18.916667

Vestmannaeyjar

Vestmannaeyjar

Iceland

IS

VEY

63.4243

-20.278875

Sault Ste Marie

Sault Sainte Marie

Canada

CA

YAM

46.485

-84.509445

Winnipeg St Andrews

Winnipeg

Canada

CA

YAV

50.05639

-97.0325

Shearwater

Halifax

Canada

CA

YAW

44.63972

-63.499444

St Anthony

St. Anthony

Canada

CA

YAY

51.39194

-56.083056

To add model support for this information, create a class named 'Airport' that will be used as the main entity in the reference dataset. To do this, select the top-level class named 'Thing' in the class hierarchy, click the yellow button in the header of the Class Hierarchy pane, enter the name "Airport" and click OK.

You will see the newly created class displayed in the Edit/View pane.

If desired, provide a description of your new class in the comment field, then click the Save Changes button at the top of the pane.

Create attributes for Airport by selecting the Airport class and clicking on the green icon at the top of the Class Hierarchy pane. Select the Airports class each time to create the following attributes:

Attribute Name (Label)

Description (Comment)

Range of Values

airport city

Main city served by airport. May be spelled differently from the airport's name.

string

IATA airport code

An IATA airport code, also known an IATA location identifier, IATA station code or simply a location identifier, is a three-letter code designating many airports around the world, defined by the International Air Transport Association (IATA).

string

latitude

A horizontal position of a location on the Earth according to a geographical coordinate system in decimal degrees, usually to six significant digits. Positive latitude is above the equator (North), and negative latitude is below the equator (South).

float

longitude

A vertical position of a location on the Earth according to a geographical coordinate system in decimal degrees, usually to six significant digits. Positive longitude is East of the prime meridian, and negative latitude is West of the prime meridian.

float

Note that an attribute for the airport name has not been created. This is because each entity has a built-in attribute "label" which is intended to hold its name.

Alternatively to manually entering classes and properties, you can use Import>Import Schema from Spreadsheet to automatically create them from the first rowm of the spreadshet and then adjust as necessary.

TopBraid EDG will always create a globally unique resource identifier, a URI, for each entry in a reference dataset. To enable this, you need to select the field which values would be used in the URI creation. The selected field is declared to be a primary key for the entity. Therefore, the field used as a primary key must always have unique values for a given class of codes. Typically the code field (or one of the code fields) will be used as the key.

You can set the primary key directly in the ontology (as it was done for countries) or set it locally in the reference dataset itself. We will use the latter method since we will create two reference dataset for the same entity that will use different properties as primary keys.

Next, click the blue button at the top of the class hierarchy to create a relationship property named "airport country". In the comment field, describe it as "A country where an airport is located". Set its range of values to the Country class ( http://topbraid.org/schema/enterprise#Country ). (Failing to do this can cause problems when it's time to import data into the new reference dataset.) To do so, start typing "Count" in the range field and pick "Country" as it appears in the autocomplete.

Since the primary key for ISO Country is its two-character ISO country code and the spreadsheet contains this information, EDG will be able to create a relationship between airports and countries as we import spreadsheet data. Note that we have not created a field for the country name; names of the countries are already maintained as part of the country codes, and therefore including names will redundantly add another country name.

In the next step a reference dataset will be created that will store reference data for the airports.

For more information on working with ontologies, and especially creating property shapes that will let you validate reference data, see User Guide.

Creating Reference Datasets

Go back to the home page by clicking the TopBraid EDG logo in the upper-left. You can now click on the Reference Datasets link in the left hand navigator menu, see the page with all reference datasets you have access to and create a new reference dataset using a link on that page.

However, we want to associate the reference dataset we're about to create with a particular "governance area". We can do this by creating the dataset directly from the Governance Areas page.

Governance areas group asset collections according to organization's business or data subject concerns. Governance areas are used to define a delineated part of stewardship. They partition and delegate ownership of assets, and define a meaningful context for assets that are associated with a governance area.

Select the Governance Areas link located in the left menu under Governance Model section. First, create a new governance area. Click the Create Data Subject Area button, add a data subject area with the label Logistics.


Not every user will have permissions necessary to modify governance areas. If you can't create a new governance area, contact your EDG Administrator.

If you are not using TBC-ME or TopQuadrant's evaluation account to work through this tutorial, then before your own TopBraid EDG installation can let users create new reference datasets, it must be configured to work with a backend database for storage. See Configuring the persistence technology for further information.)

Now you're ready to create the dataset. Choose Reference Dataset in the Choose type dropdown.

You will see the following page:

Enter Airports as the label (or name) of the dataset and for its description enter: Reference dataset of airports with IATA codes. The Ontology to Include option lists the ontologies that are available to the user, which in turn will provide the class for the dataset's main entity. In this case, select the Enterprise Ontology Example, which has the Airport class that you defined. Click Submit. You will see a message that the dataset was created and you will be forwarded to the Import page where you can load data. However, before we can do this, we must finish setting up the new dataset by identifying its main entity and specifying primary key for the entity - since we didn't specify the primary key in the ontology.

Setting the Main Entity

Ontology used for creating a reference dataset will typically contain several classes (entity types). After creating the reference dataset and before importing the airports data, you need to tell TopBraid EDG what reference data will be in the dataset. This is done by identifying the "main entity" for a dataset. In our example, it is Airport class. There are two ways to set the main entity initially.

  • If the main entity is unset, then editing the reference dataset will first require the main entity class to be selected. Click on Codes tab, and select Airport , from the provided dropdown that list classes from the included ontology.
  • A reference dataset's main entity class can also be set or changed via the its utility: Settings > Metadata > Edit > main entity (class).

Make the Airport class the main entity using the first method.

Setting in the Primary Key in the Reference Dataset

After you clicked on Codes and set the main entity, TopBraid EDG will ask you to select a property to serve as a primary key. Since it is used for the URI construction, a primary key must be defined before any data is loaded into a dataset.

Select the "IATA airport code" property. For the  Start of URIs  value you can accept the default value or override it. Click on Set primary key for this Dataset button.

You will now see a tabular display that lets you create reference data. However, instead of manually entering data, we will import it from the spreadsheet you downloaded earlier.

Importing Reference Data

Select Import > Import Spreadsheet using Pattern. Then click Choose File to select the spreadsheet. (Download the airports.xlsx spreadsheet to get a local copy to import.) This page has two more fields:

  • Sheet index: by default this is 1. This spreadsheet has only has one worksheet and therefore there is no need to edit it.

  • Entity type: a list of classes from the included ontology (the enterprise ontology) to indicate which one is being populated by the airport. Ensure that Airport is selected.

Clicking Next shows several potential patterns for spreadsheet data. Select No Hierarchy. (Note: Reference data supports managing hierarchies as well as flat lists. However, the spreadsheet we are importing does not contain any hierarchical structures.)

The next step is to map the spreadsheet columns to the properties of the Airport class as shown below, which maps the columns to the properties defined above and to the built-in "label" property. Note that in the image below Altitude column was not mapped by choice - to demonstrate that only mapped columns will be imported. The Country column was also not mapped because it contains country names that are already managed as part of the ISO Country Codes reference dataset - also included in the samples shipped with TopBraid EDG.

Click the Finish button. After data is imported, click on Codes tab to view the reference dataset.

A page appears containing a table with the imported data. Clicking on a row displays information in a new browser tab.

The table displays 25 rows at a time by default. This default can be changed by resetting the field at the top left corner of the table as shown above.

The columns displayed in the table can be modified.

To save the current configuration of columns as a default for all users, click on the "Save current search as default"  icon at the bottom of the Search form. (Note that all buttons have mouse-over text with an explanation of the button's purpose.)

Reference datasets can be organized in hierarchies as well as in flat lists. If a reference dataset contains hierarchical relationships between codes, these can be viewed and modified by clicking on the tree icon  to the left of the chosen Airport class in the page header.

Including other Reference Datasets

As shown in the first screen shot of the reference data, the Airport Country column contains URIs of the countries and not their names or the code values. It happens, because the reference dataset describing the country codes was not added to the Airports dataset (or not included in it). We can fix it by clicking on Settings tab and including the appropriate reference dataset. Click on Includes. In a pop-up window start typing Country and then select "Country Codes" to include it in the Airports. Instances of the Country class are now included in the Airports dataset by reference, meaning the data is not copied, but included.

Referencing other dataset in this manner ensures that reference data for countries is maintained in one place. If a country is renamed, for example, Cape Verde, an island country in West Africa, is renamed to the Republic of Cabo Verde, the update needs to occur in only one place, the ISO Country datasets. All datasets that include ISO Country will se this change immediately. At the same time, you will have access to country names and all other information from any reference dataset that includes country codes. The names and other reference data for countries is stored in the Country Codes dataset.

Once the reference dataset for countries is included, EDG will automatically match countries to the values of the "airport country" property. Click on the Codes tab. Note that Country codes appear in the Airport Country column instead of URIs as before. These codes come directly from the ISO Country dataset.

Click on any of the rows to see a View/Edit form for the selected airport.

The "airport country" property is now populated with a country code from the ISO Country dataset. Clicking on a country code link will open up a form that will show you other information about the country directly from the ISO Country dataset.

You can change the "focus" of the table from Airports to other data by using he dropdown field to the left of the user name in the header. Currently, 'Airport' is chosen. You can switch the focus to any other class related to the Airport. In our case, the only related class is Country.

Included data, such as the Country Codes data referenced by the Airports dataset, can be viewed and searched, but modifications to included data is not permitted. Included data can only be modified by editing the included referenced dataset directly. You will be able to edit only codes for the main entity - or one of its subclasses.

Managing Metadata for a Reference Dataset

Reference datasets (and, in general, any asset collection in EDG) can have metadata such as name, description, status, etc. The metadata associated with an asset collection can be viewed/edited on the Settings page. Click on Settings and scroll down to the Metadata section. Expand the Dataset Status and Property Definition sub-sections to see available information about Airports dataset.

We have identified the main entity and entered the short description information earlier in this tutorial. In the Overview sub-section, the related entity value is automatically derived by the reference dataset as any class (entity) that is connected to the main class. Country appears because this is the main class for the Country Codes dataset now included. The last updated field is also automatically recorded.

When a dataset is first created, the status is automatically set to "Under development". This can be edited to update it when the status of the dataset changed.

TopBraid EDG is shipped with some predefined status values. They are configurable if your organization needs a different set of values.

The Property Definitions sub-section shows the description of each property of the main class as it was entered in the ontology. You can also add additional descriptions local to the dataset; these will not be part of the enterprise ontology.

Click the Edit button to see more available fields. You may want to differentiate private (internal) reference data from public (external) such as ISO country codes. Set  is external dataset  to "true" in the  Dataset Status  section of the form. IATA codes are maintained by the IATA Association, which publishes updates bi-annually. Change the status code to Approved. Click Save changes at the top of the Metadata section.

Once the status of a reference dataset is approved for use, you will no longer be able to delete codes from the dataset, but you will be able to change information about them.

Documenting a Reference Dataset as an Enterprise Reference Dataset

Your organization may have several reference datasets in EDG that contain codes for a given entity. For example, you may have different existing applications and corresponding sources that already store and use airport codes. The goal of standing up a system for managing reference data is to achieve alignment across your existing reference data and to streamline its management. This alignment takes time. At least initially, you may have in addition to a "master" reference dataset that you want to be a definitive source of reference data for a given entity across all system, reference datasets that capture what each of your systems is using.

To differentiate between your master referrence dataset for airport codes and others "in situ" reference datasets, In the Metadata section of Settings click on Edit and find is enterprise dataset field. Set this flag to true and click Save.

If another reference dataset is created for the same entity, it could be mapped to the enterprise dataset using Crosswalks. TopBraid EDG can auto-create crosswalks between two datasets. It also offers crosswalk web services to translate between codes.

Creating Reference Data Facts

In the Metadata section of Settings, expand the Reference Dataset Facts section and enter the following "fact":

IATA codes should not be confused with the FAA identifiers of US airports. Most FAA identifiers agree with the corresponding IATA codes, but some do not, such as Saipan whose FAA identifier is GSN and its IATA code SPN, and some coincide with IATA codes of non-US airports.

Note that the text area displayed allows rich text, including hyperlinks. The links above can be replaced by choosing the text to be hyperlinked, such as "Saipan", and click the chain link in the icon box. Add the hyperlink to the text box that appears.

Click on the plus + icon to the left of the fact field name to add an additional entry and enter this additional fact there:

Since "Q" is used for international communications, IATA airport codes never begin with "Q".

Save your changes. The fact is now part of the metadata for the dataset and can be referenced, searched, etc.

You can define facts at a dataset level and also specify them for a given code in the reference dataset. If you want to do the latter, you need to include in your reference dataset a pre-built Reference Data Facts ontology. Your EDG administrator can also specify this inclusion as a system-wide setting for all reference data.

Entering Subscription Information for External (public) Reference Data

In the Metadata section of the Settings tab click the Edit button again. You will see a new sub-section on the form called Subscription; this is used to capture subscription-related information for external reference datasets. Add "IATA Association" to the "sourced from" field. You will only need to type the first few letters of its name, because the reference data knows that only one defined organization begins with those letters.Click the Save Changes button.

For additional information, see Reference Dataset Utilities - Settings > Metadata.

TopBraid EDG is shipped with predefined metadata fields for reference datasets. They are configurable if your organization needs different metadata. EDG is a semantic, model-based solution. Configuration is done using steps similar to those used to modify ontology models to accommodate new reference data.

Assigning Access Privileges to other Users

For any asset collection in EDG, including reference datasets, a user can have one of the following permission roles (see Asset Collection Permissions for more information):

  • Viewer A Viewer can browse a dataset, viewing all the reference data (as well as any change history associated with that data) and the metadata associated with a dataset. A Viewer can create saved searches and export data. They can create and view tasks, add comments and change status of a task assigned to them. A viewer can also start a workflow. The Viewer then becomes the Manager of the working copy that is associated with the workflow. However, these changes will not affect the reference dataset until they are approved and committed by a user that has Editor permission for the dataset.
  • Editor In addition to being able to perform all activities that a Viewer can perform, an Editor can make changes to the dataset's metadata and to the reference data itself.
  • Manager A Manager has the most capabilities. In addition to all the activities that an Editor can perform, a Manager can delete an entire dataset, they can change the default columnar view for all users and they control the access privileges that other users have over a particular dataset by assigning Manager, Editor, or Viewer permission roles to them. They can also reassign and change the status of all tasks, even those that are not assigned to them. A person who creates a reference dataset automatically becomes its Manager.

To give others access to the dataset, go to the User Roles tab on the dataset's home page.

Permission levels can be set for (1) individual users, (2) user security roles (e.g., from Tomcat or LDAP), The list of users you will see on this tab can include individual users and LDAP roles. A Manager can assign Manager, Editor and Viewer privileges to each user or user group. User Roles page is also used to set up  governance roles (as defined in the Governance model) for individual reference datasets. Governance roles can also be defined at business area or data subject area a reference dataset is associated with.

Governance roles provide an alternative approach to assigning permissions because if a user has any governance role for a reference dataset (or any other asset collection), specified either directly for a dataset or in directly for a subject area the dataset belongs to, they will automatically get Viewer permission.

Modifying Reference Data

Dominica's main airport, the Melville Hall Airport, was just renamed to the Douglas-Charles Airport in tribute to its late prime ministers, Rosie Douglas and Pierre Charles. While your next bi-annual update from the IATA Association will reflect this change, you need to make it ahead of receiving the update.

Click on Codes tab. Search for Dominica by its code, DMA, using the search form's "airport country" field to get two airports in Dominica. Click on the Melville Hall to display its information, then click the Edit button at the bottom of the screen. When you make the change to rename its label value to "Douglas-Charles Airport", you can check the Enter log message before clicking the Save Changes button if you want to include a log message about your change.

TopBraid EDG keeps a complete audit trail of all changes. Click the "Show History" check box to see the audit trail.

To create a new airport, click on the New button in the button-row above the airports table.

Export, Collaboration, and other Activities

Some of data stewards' tasks overlap with the tasks of other users. For example, stewards may build exports of reference data, but so do data managers. These overlapping activities, including collaboration between users working with reference data, are covered in the Getting Started Guides for Data Manager and Getting Started for the Business Analyst.

Creating a Crosswalk

Some systems may use a different local set of codes for the same entity - in our case, Airport. In these cases, you will want to map local, in-situ codes to the enterprise reference dataset for airports.

First, lets extend the ontology to create a new property for the class Airport, calling it "local airport code".

Now, create a new reference dataset. You can do this from the Governance Areas page as described previously. Or, alternatively, go to the EDG home page and click on the Reference Datasets located on the left navigation menu under Asset Collections. You will see a page listing all  Reference Datasets you have access to. This page includes a Create New Reference Dataset link. When dataset is created this way, it will not be associated with any governance area. You can add association to a governance area later by updating dataset's metadata under Settings .

Let's assume that it is a dataset used by a hypothetical Flight Tracker application and call it Flight Tracker Airport Codes. Include it in it Enterprise Ontology.

Click on Edit. When asked, set main entity to Airport and click on Continue. When asked, set the primary key to be local airport code. Adjust start of the URI as necessary.

Create a few New York area airports using data from the table below.

 

Airport

Local Code

La Guardia

1

JFK

2

Westchester County

3

Newark

4

Islip

5

 Create a new Crosswalk from the Flight Tracker Airports to the enterprise reference dataset Airports as shown in the image below. Click Finish.

You can now map two sets of airport codes manually or automatically. TopBraid EDG supports many to many mappings. Click on Edit to view the crosswalk. Initially, it has no mappings. To map manually, position your cursor in a row in the target dataset (yellow background) and start typing the name of an airport.

Autocomplete list will appear. Select your choice from the dropdown button and click on the green + button to create the mapping. You can also add a note to describe the mapping if desired.

To auto-map select Reports > Problems and Suggestions.

TopBraid EDG will generate some suggested mappings for you based on the airport names. Move the confidence level to 30% in the slider to filter out unlikely suggestions.

You can now accept suggestions one by one or move the confidence level even higher to let's say 70%, accept all top suggestions and then individually pick any lower confidence suggestions you want to apply. From the generated list, we want to accept La Guardia mapping, Newark Liberty mapping and Westchester Co mapping. The official name of the Islip airport on Long Island is Long Island MacArthur, so it was not found. Add this mapping manually. Your crosswalk should now look as follows:


To see more information about the mapped airports including their IATA codes, you can double click on a row. The form will open in a separate window. For more on working with crosswalks see Crosswalk User Guide pages.

Documenting the Use of a Reference Dataset

If you are using TopBraid EDG for Metadata Management or TopBraid EDG for Business Glossaries together with TopBraid EDG-RD, you can document the use of a reference dataset in your applications catalog, data assets catalog and/or business glossary. See relevant User Guides for more details.

Getting Started for the Data Manager

While this section can serve as a standalone tutorial, it assumes that all steps described in the  Getting Started for the Data Steward section has already been completed. the Airports reference dataset has been created and populated with data and you have access to it.

Defining Reference Data Export

As a data manager, you may need to distribute reference data for use in your data source. Export is one way of doing this. Reference data can be exported in full or as subsets of data defined through search criteria. After finding the reference dataset you need, click the dataset's Export tab to view the available exports. (Examples in this section use the Airports reference dataset.)

This tab includes an option to export all information available in a dataset. There may also be exports that focus on specific subsets of data; these are accessible from the Export Saved Search link. If there is no export that suits you, click the View/Edit Production Copy link and save a new search via the search form, which is the left-pane that toggles open/closed. Assuming that your application doesn't need latitude and longitude information, deselect these from the search form. Start typing "US" in the airport country field and pick "USA" from the autocomplete. Click on the Search button and results will appear in the grid. Different export formats can be chosen by clicking on the gear wheel icon  at the bottom of the search form. Export formats include TSV, XML, and JSON.

If these results fit your needs and you expect to pull this data from the dataset on a periodic basis, save the search by clicking the Save current search... button at the bottom of the search form and giving your search a name such as "US airports". Saved searches are web services that you can use to automate distribution of reference data.

Click the Show saved searches... button  to display a list of saved searches. Selecting one and clicking the Select button under the list will fill out the Search Form as specified by the selected search, and you can then click the Search button to re-run the saved search.

When selecting a saved search from the list, note that above the Select button is a URL that can be used as a RESTful web service call to invoke the search.

See  Search Within a Reference Dataset  for more information on using the Search form.

Viewing Saved Searches

Go back to the dataset's home page and select the Export tab if it is not already selected, then click Export Saved Search.

TopBraid EDG remembers the last tab visited on a home page of a reference dataset. When you come back to the home page, it will automatically select the most recently viewed tab.

Export Saved Searches will list all searches saved, including a description of the fields they include. The format for exporting the saved search can be selected by using the the Result Format field. Formats include CSV/TSV XML and JSON.

Click the Export button with Text/CSV result format selected to get a comma-separate value file that you can either save or open in Excel.

The URL of the saved search is displayed in the Service URL field, including a unique id for the saved search. This URL can be copied as-is and included in any third-party application needing to extract the codes in the saved search. The format is set at text/csv, but can be manually modified to any of the export formats supported by Saved Search.

Saved searches can be deleted in the Search view for the dataset. Choose "Show saved searches...", select the search to remove and click Delete.


Using TopBraid EDG-RD Web Services

TopBraid EDG includes pre-built services for validating your locally stored reference data against the datasets managed by EDG. It also includes crosswalk services for translating from one set of codes to another set of mapped (crosswalked codes). See relevant Guides for more details on how to use these services..

Getting Started for the Business Analyst

Finding a Reference Dataset

While this section can serve as a standalone tutorial, it assumes that all steps described in the Getting Started for the Data Steward section has already been completed. the Airports reference dataset has been created and populated with data and you have access to it.

When you click on Reference Datasets link in the left hand side navigator, you will see a list of reference datasets you have access to. This list can be long, especially in large organizations with lots of different reference data. To find a specific dataset, use Find Asset Collection.

If you know of collections (e.g., ontologies or reference datasets) in your EDG system that do not appear, you might not have the appropriate viewing or editing privileges for them. Each such collection requires a manager to provide access by setting you (or your security role) as a viewer, at least. See the collection type's User Roles utility documentation for details about these steps.

The Find Asset Collection search form lets you enter more specific search criteria against a number of preselected fields. You can use a combination of search criteria. For example, a status value of "Approved" and a related entity of "Country". You can also use the Search Any Text field to enter a text string that by default will be matched against any textual information contained with the reference dataset. (Your administrator may configure this feature to only search over a subset of text fields, such as the name and description of a reference dataset.) If you need to target your text search to only one specific field such as description, click the triangle button to the right of the field to get menu options, select "text contains" and enter "country". Otherwise, unlike Search Any Text which assumes partial match, it will try to find an exact match—for example, a description that contains only the word "country".

You can use the same approach to find crosswalks. Alternatively, if you find the reference dataset you are interested in, such as enterprise dataset for Airports, you can click on its Settings tab to see all crosswalks it participates in and navigate to the crosswalk of interest.

Finding a Code

To find a specific code, go to Find Code in the left hand side navigator.

You can also use Search the EDG facility as described in the Getting Started with Business Glossaries.

Viewing Dataset's Metadata

The Settings tab > Metadata  section contains descriptive and contextual information about the dataset, grouped into sub-sections. Note that empty sub-sections might not be displayed until Metadata is placed into Edit mode.

Also, some "dependent" (variant) sections might not appear unless certain conditions apply, e.g., setting Dataset Status > is external dataset > true, identifies a reference dataset as external/public which makes available the Subscription section with associated fields describing the source of public reference data and how and when it gets updated.. The Property Definition (Semantic Analysis) section has a field-by-field description of each field in the dataset's main entity class.

Using Reference Dataset and Data Facts

As a Business Analyst you may have a report that needs to include a data feed that uses FAA airport identifiers. Reviewing the data, the FAA identifiers in the data seem to match the IATA codes, but you want to double-check that this would correctly integrate with the rest of your data which uses IATA codes.

Expand Reference Dataset Facts sub-section of the Metadata section. You will learn that while many FAA identifiers are identical with IATA codes, there are also differences. Assuming that these are the same codes would have let to errors in the integrated reports. To correctly integrate data, you should request a steward to build the crosswalk between the two sets of codes.

Creating Tasks and asking Questions about a Code

You may want to ask the reference data governance team to add FAA identifiers to the reference dataset because you believe this information will be useful not only for your immediate task, but for other applications and, should therefore be managed with the rest of the reference data.

Reference datasets let you log requests and questions in a form of Tasks. Tasks can be associated with an individual code or with the entire dataset. To create a task for the entire dataset, go to the Tasks tab for a dataset, click the Create Task link and enter:

"Most of my data is coded with IATA codes, but I am starting to integrate new data feeds that use FAA identifiers. Please expand the dataset to include FAA identifiers."

You can select which user to assign the task to. By default it will be assigned to the dataset's manager. Click the OK button.

The task is now displayed in the tab. Tasks can be filtered by assignee and by status. Clicking on a task opens it in a popup dialog that lets users post responses or ask for additional information by adding Comments.

To create a task for a specific airport, click Codes, select the code you want to associate a task with and click on the Task icon at the top of the form that shows detailed information about a code. Once a task is created, the number (0) in the icon will change to reflect a number of outstanding tasks.

Next Steps

You are now ready to explore the EDG User Guide - Overview to learn more about the many capabilities of TopBraid EDG, including workflows for team collaboration, importing more complex spreadsheets, and more.

  • No labels