Linked Open Vocabularies - Big Picture

What is LOV about?

Vocabularies we are about are the many dialects (RDFS and OWL ontologies) used in the growing linked data Web. Most popular ones form now a core of Semantic Web standards de jure (SKOS, Dublin Core, FRBR …) or de facto (FOAF, Event Ontology …). But many more are published and used. Not only linked data leverage a growing set of vocabularies, but vocabularies themselves rely more and more on each other through reusing, refining or extending, stating equivalences, declaring metadata.

LOV objective is to provide easy access methods to this ecosystem of vocabularies, and in particular by making explicit the ways they link to each other and providing metrics on how they are used in the linked data cloud, help to improve their understanding, visibility and usability, and overall quality.

Who needs LOV?

LOV targets both vocabulary users and vocabulary managers.

  • Vocabulary users are provided with a global view of available vocabularies, complete with precious metadata enabling them to select the best available vocabularies for describing their data, and assess the reliability of their publishers and publication process.
  • Vocabulary managers are provided with feedback on the usability of what they maintain and publish, common best practices their publication should stick to in order to keep being reliably usable in the long run.

Technical and social sustainability

LOV provides a technical platform for search and quality assessment among the vocabularies ecosystem, but it also aims at promoting a sustainable social management of this ecosystem. Committing to a vocabulary is a social contract with its creators and publishers, including trust in their sustainability. During the first year of the project, we had numerous exchanges with vocabulary managers, pointing at possible improvements or corrections, which most of the time have received a very positive feedback, adding to the overall quality and consistency. We hope the future technical improvements of LOV data base and related services will go along with a growing social interaction with vocabulary managers.

Beyond the LOV project, we have the vision of a future linked data Web supported by a living Vocabulary Alliance gathering as many as possible stakeholders in the long-term conservation of vocabularies.

LOV Team

Bernard Vatant

Dr. Pierre-Yves Vandenbussche

Short Bio: Graduated from ENSET (Cachan, France) in 1975, Bernard Vatant has taught mathematics from 1975 to 1997, then moved to new knowledge technologies. Since year 2000, he’s been senior consultant for Mondeca, in charge of ontologies and data migration to Semantic Web standards, and as such involved in several working groups or standard bodies such as ISO (ISO 13250 Topic Maps , ISO 25964 standard on Thesauri), and W3C (OWL, SKOS). His contributions to the Linked Data community include the and ontologies, participation to modelling and migration to semantic standards of reference vocabularies such as the European Community Thesaurus Eurovoc, or French Official Geographical Code, in collaboration with INSEE. Short Bio: Pierre-Yves Vandenbussche received a PhD. (2011) in Information Technology from Paris VI University (France). He is currently a researcher in Mondeca's company and in INSERM French medical research laboratory. P-Y. Vandenbussche research interests are mainly in the application of semantic web technologies for data publication and knowledge organization systems management. His contribution to the Linked Data Community include the SPARQL Endpoints Status and Challenging Time! applications or participation to the design and migration to semantic standards of reference vocabularies such as Anatomico-pathological thesaurus.
Slogan: What is not translatable, never stop translating it. Slogan: LOV is all!

LOV Dataset

The LOV dataset contains the description of RDFS vocabularies or OWL ontologies used or usable by datasets in the Linked Data Cloud. Those descriptions contain metadata either formally declared by the vocabulary publishers or added by the LOV curators. Beyond usual metadata using Dublin Core, voiD, or BIBO, new and original description elements are added, using the VOAF vocabulary to state how vocabularies rely on, extend, specify, annotate or otherwise link to each other. Those relationships make the LOV dataset a growing ecosystem of interlinked vocabularies supported by an equally growing social network of creators, publishers and curators.

To be included in the LOV dataset, a vocabulary has to satisfy the following requirements:

  • ─ To be expressed in one of the Semantic Web ontology languages : RDFS or some species of OWL
  • ─ To be published and freely available on the Web
  • ─ To be retrievable by content negotiation from its namespace URI
  • ─ To be small enough to be easily integrated and re-used, in part or as a whole, by other vocabularies.

As an indication of what “small enough” means, currently (as of end of 2011), the median vocabulary size in LOV is around 10 classes and 20 properties, and more than 80% of vocabularies have less than 100 elements (classes and properties). The largest vocabulary in LOV is currently, with more than 500 elements. And of course, to be included a vocabulary has to come under our radar. We monitor as much sources as possible, including the description of datasets in CKAN, announcements on various Semantic Web lists, buzz on Twitter, Google+ etc. But if we have missed your favourite vocabulary, help us and suggest it.

Note on prefixes:

Each vocabulary is represented in LOV by a prefix standing for its namespace, e.g., “foaf” for the Friend of a Friend Vocabulary or “dc” for Dublin Core terms. As far as possible, we have respected the prefix recommended explicitly (through a vann:preferredNamespacePrefix declaration) or implicitly (use in the vocabulary source) by the vocabulary publisher. But some exceptions to this rule are found in LOV, in two cases:

  • ─ The same prefix can be recommended by more than one vocabulary, but we want the prefix to be non-ambiguous across all LOV space. In that case, the first listed vocabulary has the priority, and the newcomer(s) will be given a substitution prefix.
  • ─ If the prefix used or recommended by the vocabulary publisher is too long, we define a shorter one in LOV. The rule of the thumb is to try to stay under 5 letters, and never go over 7 letters, keeping in mind that those prefixes are used for both display and quick reference, and that 5 letters among 26 allow more than 11 millions different combinations …

"LOV Aggregator" feature

The "LOV Aggregator" feature aggregates all vocabularies in a single endpoint/dump file. Last version of each vocabulary is checked on a daily basis. This endpoint is used to extract data about vocabularies and is used to generate statistics ("LOV Stats" feature) or to support research ("LOV Search" feature).

While a vocabulary is aggregated, for each vocabulary elements (class or property), an explicit link rdfs:isDefinedBy to the vocabulary it belongs to is added.

"LOV Search" feature

The "LOV Search" Features gives you the possibility to search for an existing element (property, class or vocabulary) in the Linked Open Vocabularies Catalogue.

Results ranking is based on several metrics:

  • ─ Element labels relevancy to the query string
  • ─ Element labels matched importance
  • ─ Number of element occurrences in the LOV dataset
  • ─ Number of Vocabulary in the LOV dataset that refer to the element
  • ─ Number of element occurrences in the LOD

"LOV Stats" feature

LOV Stats are computed on all LOV vocabularies aggregated in LOV Aggregator. It provides some metrics about vocabulary elements. "LOV Distribution" metric is about the number of vocabularies in LOV that refers to a particular element. "LOV popularity" is about the number of other vocabulary elements that refers to a particular one. "LOD popularity" is about the number of vocabulary element occurence in the LOD (We thank OpenLink LOD cloud cache endpoint server for sharing their data).

"LOV Suggest" feature

The "LOV Suggest" feature gives you the possibility to submit us a new vocabulary in order to include it in the LOV catalogue. After validating your vocabulary URI, you will be able to correct your vocabulary before submiting it. Some recommandations for vocabulary metadata description may be of help.

The LOV dataset is licensed under Creative Commons CC BY 3.0 It is developed in the framework of the Datalift project and supported by the Open Knowledge Foundation (OKFN).
If you have any remark, suggestion or question, please contact editors