Data Visualization … Visualized Badly

March 2nd, 2012 by Bradley Shoebottom

Originally submitted at O’Reilly

Intentional Communication from Data to Display

Designing Data Visualizations

Data Visualization …Visualized Badly

By That Ontology Guy from Fredericton, NB on 3/2/2012

 

2out of 5

Pros: Easy to understand, Concise

Cons: Not comprehensive enough, Too basic

Best Uses: Student

First with the positives: It is a good introductory read into the subject. An average person could likely read in under an hour and take away some basics. (I read it on the bus on the way home). I would immediately recommend reading some of the many references listed for more detailed information. I did like the distinction between data visualizations that are largely one-off builds that almost require a project in themselves to create versus basic effective visualizations. Most people only have time for the basic visualizations. Missing was a description of how some people “jazz” up graphics bar charts by making them “3-D” but effectively making it too hard to compare the data.

Now the negatives: 1. I received this book as a reviewers copy in hard copy. I wish I would have had the PDF because the 32 pages of color graphics (for only a 93 page book) meant I immediately had to find a color printer to print them off so I could really understand them. The book size dimensions also really limited their size and thus their interpretability.

With the introductory nature of the subject coverage and the color demands and size demands of the figures, I feel this book would have made a great web-site. As it stands, it is a basic primer good for a school or public library.

(legalese)

That Ontology Guy is going to CMS/DITA 2012

February 28th, 2012 by Bradley Shoebottom

I am headed to a conference in   in April to attend CMS/DITA 2012. The Center for Information Development Managementhosts a conference on the XML language Darwin Information Technical Architectureevery year. DITA is used in technical writing for user manuals and learning materials.

I will be giving a presentation on the use of RDF, OWL, and microformats for the purpose of improve search results on your content. Basically, RDF, OWL and micro formats add structured meta data to your content allowing search engines to find the content easier, and users to do things like faceted search on the content. An OWL ontology can even accommodate synonyms, acronyms, and incorrect spellings of your terms and still give you good search (kind of like how Google suggests corrected search terms for you.

Here is my abstract:

“DITA XML offers great advantages for authoring content and re-using it. It also has semantically rich tags that can aid search. Finding the proper technical content may be a challenge even with the DITA Subject Schema map that creates relationships between meta data. Unfortunately, the Subject Schema map has very basic logic. RDF, OWL, and micro formats offer more expressive relationships between content thus improving the search results for end users. Bradley discusses the advantages of using RDF, OWL and micro formats with your XML. He also describes a semantic search implementation using OWL and XML.”

I will be blogging about the conference, likely while I am there.

Who is that Ontology Guy?

May 19th, 2011 by Bradley Shoebottom

If you have been following my posts, you might be wondering by now “Who is that Ontology guy?” “What does he do?” “What value does he bring to the organization?”

What does an ontologist do?

Officially, I would have the title of Ontology Engineer responsible for ontology engineering. I might also be called a knowledge engineer which is less precise, and in my organization I have the job title of Information Architect. My specific role is to build formal semantic knowledge models linking concepts together with their appropriate relationships. An example would be to build a formal representation of the linkages between employee information (title, skills, work location and contact information) and the organizations structure (business units, managers).

Some specific information projects that an ontology engineer would be needed for include:

  • Semantic integration of databases
  • Knowledge exploration
  • Automatic document tagging
  • Automatic summarization of a document
  • Business intelligence mashups
  • Decision making support
  • Common meta data vocabulary for manually marking up a document.

See my Getting Started with Semantics for more information.

Ontology engineer skills

An ontology engineer has several skill sets they need to be successful:

  • technical
  • project management
  • relational.

Ontology engineer technical skills

The technical skills relate to how to formally think of knowledge and how it is organized (library sciences and computer  science data structuring), what information needs to be answered or queries  (SPARQL Queries), the tools needed to model the knowledge (Protege, TopBraid Composer), and how ontologies help create web services (semantic web layers) to allow users to interact with data, forms, and websites.

Ontology engineer project management skills

The project management skills relate to being able to divide the project into stages such as:

  • ontology planning,
  • ontology design,
  • ontology development,
  • ontology testing, and
  • ontology maintenance

into sprints to develop the ontology into re-usable modules so the modules can be re-used in other projects. The key is to be able to do iterative development and testing cycles.

Ontology engineer relationship building skills

The relational skills concern how the ontology engineer understands:

  •  the objectives of the client,
  • elicits information from the client subject matter experts to build the ontology, and
  •  the skills of being able to work with other highly specialized team members such as text mining engineers, application developers, usability testers, data architects, and business system architects.

So who makes a good ontology engineer?

Often an ontology engineer is a subject matter expert with a flair for thinking abstractly and conceptually, with a good idea of knowledge structures like library sciences or structured data architectures. The ontology engineer is conversant with classification schemes, taxonomiescontrolled vocabulary lists and other knowledge organizing methods.

The value of the ontologist

The ontologist brings value to the organization by formalizing the relationships between ideas, business practices, data, and between databases. The ontologist brings “rigour” to the organization so that more information can be made available to end users.  So, the ontologist helps design the “structures” needed to provide the correct amount of related information.

Presentation of Semantic Search Research Results at AINA 2011

April 6th, 2011 by Bradley Shoebottom

Here is a copy of the presentation delivered by Dr Chris JO Baker at the AINA 2011 (25th IEEE International Conference on Advanced Information Networking and Applications) conference in Singapore in February 2011.

He presents on the research paper documenting research conducted in developing text mining algorithms used to elicit information from unstructured text and populate an ontology that is used in a knowledge base by technical support workers. The research was conducted by Alex Kouznetsov, Bradley Shoebottom, Johannes B Laurila, and Chris Baker.

Identifying Classes and Instances

March 23rd, 2011 by Bradley Shoebottom

One of the trickiest ontology modelling dilemmas an ontologist has, is when to call something a class and when to call something an instance or entity. (See my earlier blog What is an ontology?) If you over-design your ontology and make instances a class, you end up with only “singleton” instances. While this “over-designed” ontology model is useful for knowledge exploring at the conceptual level, it leads to a bloated ontology that actually makes it harder for end-users to know where to find the instance that is of interest to them. It may be actually better to have less classes and more instances.

A recent example of this occurred to me when driving between Moncton and Sackville, New Brunswick. We are a French/English bi-lingual province. The road signs are annotated like “Chemin Shediac Road” In proper French, it is Chemin de Shediac. The “De” caught my attention because that is a hint that there is an instance and a class. The Class is Chemin or Road, and the instance is “Chemin de Shediac. In English, it is sometimes hard to pick up on the instance/class correlation because English uses less “de” or “of” constructions. In English, like German, we like to string 4 nouns together to make up a new name. Technically, each noun can be a new class/subclass structure, but we miss it because of the lack of “of”. A good example of this lack of “of” construction is the “8005DI AC Power Supply.” The high level class is “Power Supply”, followed by the subclass of “AC Power Supply” (Alternating Current), followed by the sub class of “DI” (Dual Input – 110 and 220 Volts), followed by the instance of “8005.”

So what is the practical purpose of this?

Knowing this language construct and relationship can help you understand when to have a class and instance. It can also help you understand that your can create a natural language processing rule for text mining that can focus on “de” or “of” to help automatically populate your ontology. You could even use your text mining software to look for “of” constructions (Jape Rules in GATE for example) to help you build your ontology faster. You can even search for this construct in your PDFs.

Testing Times for the Contact Center and Semantics

December 22nd, 2010 by Bradley Shoebottom

Here is a presentation Dr Chris Baker recently gave at the ESTC 2010 Conference in Vienna. Innovatia presented our results from testing the Top Quadrant Ensemble search interface and the Ontotext KIM search interface. Both search interfaces dramatically improved the efficiency of search for technical support agents in a contact center. See the slide-deck presentation here. Here is the video recording.

Abstract:

Funded by the Atlantic Canada Opportunities Agency and the Atlantic Innovation Foundation, Innovatia Inc. have pioneered the design and testing of semantic technologies for use by Contact Centre agents who provide Technical Support to customers in the Telecommunications sector. This is in response to increasing contact center costs for companies whose products and information support services must rapidly evolve. Numerous opportunities exist for increasing the productivity of knowledge workers involved in searching separate and disconnected product-specific knowledge bases, case resolution databases, training manuals and technical documentation. Our technical solution comprises of OWL-DL knowledge base populated from a wide variety of document formats with sentence-triples generated by a telecommunications-specific text mining pipeline that leverages document segmentation techniques to instantiate with state of the art precision and recall. This talk will outline critical features of the platform and the performance of agents using the prototype with various user interface tools from commercial vendors with customized features to test this search paradigm. Pilot studies required agents to troubleshoot customer queries from fault symptoms to root causes and to identify procedures to resolve the causes of faults. During testing agents used visual queries, advanced dynamic form-based search, and SPARQL queries saved as frequently asked questions. In particular we have validated that Tier 1 agents can navigate a sequence of work-flows resulting in an increase in findability from 75 to 100% under real time conditions and overall search time has been reduced by 50%. Moreover, time savings can be made at the critical junction of case hand-off from Tier 1 to Tier 2 agents, such that Tier 1 agents can achieve better performance. We review contact center performance metrics and comment the on the suitability of Semantic Technologies for this business process as well as issues related to roll-out and adoption of this platform across the enterprise.

Getting Started With Semantics in the Enterprise

November 9th, 2010 by Bradley Shoebottom

Here is the presentation I am giving at AWOSS Wed, 10 Novmber 2010 in Moncton NB.

Presentation – Getting Started With Semantics In The Enterprise

That Ontology Guy is going to TopQuadrant Product Training

September 24th, 2010 by Bradley Shoebottom

I’m really pumped! Innovatia (my employer) is sending me off to get advanced product training on the TopQuadrant semantic product line. I will learn how to:

Here are the training topics in more detail.

Innovatia uses the TopBraid Composer, Maestro edition for its robust industrial capabilities to design semantic ontology’s for our customers and our semantic Research and Development efforts funded by the Atlantic Innovation Fund-IV for the semantic enterprise. (We design ontology’s in OWL 2.) (We also use the Protege semantic editor for some of it capabilities too.)

The best part of the training is that it is at TopQuadrant headquarters in Alexandria, Virginia. Road trip!

Watch for my tweets at: bradleyshoebottom

Where can ontologies be used?

September 20th, 2010 by Bradley Shoebottom

The quick answer is “anywhere” where there are relationships between ideas. Ontology implementation has had three kinds of implementation:

  • High level or upper ontology’s that describe relationships between ideas.
  • Reference ontology’s that link data like bibliographies, social networks, project information and simple knowledge structures.
  • Domain ontology’s that describe in detail a particular subject, enterprise, or knowledge area. The ontology is linked to an extensive set of source documentation for aiding in the navigation to that information.

See Lightweight, Domain Ontologies Development Methodology for a longer explanation.

So who has employed ontology’s to date?

Bio-informatics has been one of the early adopters (Semantic Bioinformatics) for the purpose of reference ontology’s. Biology is a  more structured area of knowledge than most so it has been easier for scientists to create hierarchies from pre-existing ones (like the species hierarchy). So it is easier to define concrete classes and object relationships. This has proven useful for text-mining scientific abstracts to see for example in cancer semantics, where there is more and less publications surrounding cancer to determine which kinds of cancer need further research (because of a lack of published material).

Ontology’s have also been implemented around geospatial information, so that if you come across a place you are sure where it is, a map will open showing you the location. And ontology’s are useful in relationship matching in social media websites such as dating or finding “friends” with similar interests or locations as yourself.

Ontology’s can be useful for organizing information when there are classes that are equivalent to each other, but for ease of navigation, you want them both in the ontology. For example “state” for Americans or “province” for Canadian users. Ontology’s can also help keep track of lexicon semantics or synonyms or various spellings/syntax’s of words such as “New York” and “NY City”. Ontology’s have recently gone further afield and created equivalencies in concepts between languages. Ontology’s also have functionality to track changes over time so it is easy to determine for example, how many books there are on a particular subject published over a period of years. And finally, ontology’s have built in functions so that you can quickly do calculations (See the Quantity, Units, Dimensions and Types Specification). For example, a website might give all of its measurements in the imperial system and you can click the unit and change it to metric and the number is automatically re-calculated. The author of the document does not have to encode all the possible conversions.

Implementation of ontology’s in enterprises offers the potential for substantial time efficiencies and linking of relationships between documents and other data (like author). However, many of these ontology’s require custom design to be useful, so implementation has been slower. Google Search Appliance offers both a hardware and historic algorithm solution, but the creation of an basic ontology suitable for the enterprise needs to be created and running in the background. There is great potential for creating re-usable ontology’s for common knowledge domains that can then be sold to enterprises as part of a solution.

What is an ontology?

April 16th, 2010 by Bradley Shoebottom

What is an Ontology? The Inaugural Post by “That Ontology Guy”

(Bradley Shoebottom)

This blog posting is the first in a series describing the knowledge I have acquired as an information architect, then Knowledge Engineer (Ontologist) at Innovatia. The initial 6 parts series will cover ontology’s broadly:

  1. What is an ontology? (the subject of this posting)
  2. Where ontologies can be used (the whole point of the semantic web)
  3. Who creates ontologies? (the knowledge engineer)
  4. How to create an ontology and implement it (high level)
  5. Tools used in knowledge engineering (based on previous methodology)
  6. How to represent the ontology – (Querying paradigms)

After these 6 postings, I will begin to go into the details of each. Readers are invited to visit this (link) to see the how all the information fits together.

This blog will also have resources:

  • links to other blogs, sites, standards,
  • semantic products,
  • research articles and case studies
  • events

So without further ado about the purpose of this blog….

An ontology is a semantic knowledge representation of a particular subject. Think of it as the conceptual model that helps organize a person, organization, or societies thoughts about objects, people or places. The mention of “object” “people” and “places” are an ontology that describes part of nature at a high level. The ontology is semantic because it attempts to classify concepts that sometimes have multiple meanings. This definition is used by information science and those interested in knowledge management.

The name “ontology” is semantic because philosophers consider an ontology to be the study of being, existence or reality and the basic categories of being. (See Ontology in Wikipedia)  It is this last part, the categories of being, that information scientist and knowledge managers focus on. (See Ontology (Information Science) in Wikipedia) Ontologies can be powerful tools to help people find information in a complex world where there are many synonyms for a word depending on the relationship is contextualized. It is this context idea that often defines the semantic meaning of concepts. The ontology acts as a formal definition of how the concepts will be used in a particular case. Information scientists do not attempt to define the world with one encompassing ontological model, but rather they typically implement ontologies in a narrower subject area where it is easier to define concepts, their examples, and relationships.

Ontologies are often developed in a two dimensional way using forms in specialized software, but their power is the ability to represent them in 3D-like diagrams and using special software, explore the many facts of how concepts and their examples are linked together (See the Figure of a Basic Ontology)

Figure of a Basic Ontology

Basic Ontology

An ontology is made up of

  • concepts,
  • relationships (properties),
  • examples, and
  • events.

When classes and relationships are combined, they form “triples” of Subject relationship Object where the subject and object are classes. It is these triples that form the building blocks of simple to more complex queries. A concept is also known as a class. Text mining specialists call a class a T-box when they are populating the ontology with examples. (Solid blue rectangular items in above figure)  The examples are entities and sometimes called Instances or individuals of the concept or class. (Blue outlined rectangular box “International School Bus”)  Text mining specialists call these individuals “populated A-boxes” after populating the ontology. Finally, Classes and Instances are defined by their properties.

There are two kinds of properties:

  • Relations or Object Properties – define how classes are related to each other, for example class Vehicle hasComponent Vehicle Component . Object properties can be defined in both directions – Vehicle Component isPartOf Vehicle.
  • Attributes or Data Properties – define specific attributes of the individual Instance such as a string for its name, an integer for a measurement, or a date/time for an event instance.

Classes and properties can have axioms placed on them to prevent illogical conclusions. For example you can assert the axiom that Car is a disjunction of (or cannot be) Truck. This prevents potential inference errors in the ontology.

You can also have Rules in terms of if-then logic statements. This allows you to define relationships based on specific instance data properties for more precision in the ontology. For example, if a male has a sibling with a child, and that child is defined as a Female, then that child is classified as a Niece.

Object and Data properties can have restrictions put on them. Typical examples include maximum or minimum values (called cardinality). For example, you can restrict the Person class to Male and Female, or the Day of the week can only be in the range of Monday to Sunday.

There can also be Functions applied to data properties. For example, you can define a mathematical function to convert set of units to another when only one is specific (metric to imperial or inches to feet). (See the Quantities, Units, Dimensions and Data Types in OWL and XML website)

Lastly, you can specific events in an ontology in terms of dates and times. This allows you to track trends.

Summary of Terms in an Ontology

Part of the Ontology Synonyms Description
Class Concept, T-Box A grouping of like individuals
Relations Object Property, Slot Link Classes together and also link Individuals together
Individuals Examples, Instances, A-Box Examples of the class
Events   Describe time in terms of date and time and beginning and end
Attributes Data Properties, Slot Describe specific aspects of an individual
Functions   Applied against individual attributes
Restrictions   Applied against Classes and Attributes
Rules   Applied against axioms
Axioms   Applied against classes and relations

Ontologies can be divided into Upper Level  Ontologies and Domain Ontologies. Domain Ontologies are very specific subjects such as the domain for units, distance, or time. Upper ontologies link domain ontologies together. For example, the Events, People, and Places Ontology would be an Upper Level Ontology. A Domain ontology for Events could be one that describes weather events in more detail versus one that describes political events. Most Domain Ontologies actually import domain ontologies so that it is easier to maintain the whole ontology and keep it up to date by domain specialists.

Resources:

Upcoming schedule:

  1. Where ontologies can be used (the whole point of the semantic web)
  2. Who creates ontologies -  the knowledge engineer
  3. How to create an ontology and implement it (high level)
  4. Tools used in knowledge engineering (based on previous methodology)
  5. How to represent the ontology – (Querying paradigms)