Linked Open Vocabularies

About LOV

Vocabularies describe and link Data on the Web

LOV stands for Linked Open Vocabularies. This name is derived from LOD, standing for Linked Open Data. Let's assume that the reader is somehow familiar with the latter concept, otherwise a visit to http://linkeddata.org/ or http://www.w3.org/2013/data/ will help to figure it before further reading.

Data on the Web use properties (aka predicates) and classes (aka types) to describe people, places, products, events, and any kind of things whatsoever. In the data "Mary is a person, her family name is Watson, she lives is the city of San Francisco", "Person" is the class of Mary, "City" is the class of San Francisco, "family name" and "lives is" are properties used to describe a person, the latter acting also as a link between a person and a place.

A vocabulary in LOV gathers definitions of a set of classes and properties (together simply called terms of the vocabulary), useful to describe specific types of things, or things in a given domain or industry, or things at large but for a specific usage. Terms of vocabularies also provide the links in linked data, in the above case between a Person and a City. The definitions of terms provided by the vocabularies bring clear semantics to descriptions and links, thanks to the formal language they use (some dialect of RDF such as RDFS or OWL). In short, vocabularies provide the semantic glue enabling Data to become meaningful Data.
Vocabularies are also data

Thanks to the very nature of the Web and the RDF pile of standards, vocabularies can be expressed themselves as Web data. Vocabulary terms, like people and places and all things data are about, are identified by public URIs and can be described in some RDF dialect. They can be linked inside a vocabulary and across vocabularies. Metadata can be put on vocabularies to capture information such as creator, publisher, version number, date of publication. Thanks to the recursive data model of RDF, those metadata use other vocabularies, forming a growing ecosystem.
Quality vocabularies are in LOV

The oldest RDF vocabularies have been published at the turn of the century just after the first RDF specification, in 1999. Since then, thousands of them have been published and used. Some are stable recommendations, published by standard bodies like the W3C or the Dublin Core Metadata Initiative. Some are library metadata formats provided by the Library of Congress or other large national libraries (BnF, DNB, etc.). Many more are published and used by actors as diverse as large media corporations such as BBC, national administrations such as INSEE, the European Community, universities and research projects, and some are just published by individuals and put on the community table, in the tradition and spirit of the open, collaborative Web. And a growing number has been forgotten by their publishers, have broken URIs or obsolete content, although their terms can still be found in data. This is the open Web ...

LOV provides a choice of several hundreds of such vocabularies, based on quality requirements including URI stability and availability on the Web, use of standard formats and publication best practices, quality metadata and documentation, identifiable and trustable publication body, proper versioning policy.
LOV provides sustainable resources

LOV started in 2011, in the framework of a French research projecthttp://datalift.org. Its main initial objective was to help publishers and users of linked data and vocabularies to assess what was available for their needs, to reuse it as far as possible, and to insert their own vocabulary production seamlessly in the ecosystem.

TheOpen Knowledge Foundationhas kindly provided technical hosting from July 2012 until July 2018.

Since July 2018, LOV is hosted by the Ontology Engineering Group at UPMFour years after its launch, and one year after the end of its initial framework project, LOV is supported by a small team of curators and developers. Various solutions are under study to further provide LOV with a sustainable legal framework and business model.
Contribute, join the community

Over the last four years, Linked Open Vocabularies initiative has gathered a community of data publishers, ontology designers. LOV Google+ community is now an important place to discuss, report and announce general facts related to vocabularies on the Web. Our system provides a way to suggest the insertion of a new vocabulary. This feature allows a user to check what information the LOV Bot can automatically detect. From our experience with vocabulary publication, we published a handbook about Metadata recommendations for linked open data vocabularies. Hence there are at least three main ways you can contribute to the LOV effort :

Publish quality vocabularies and suggest them for insertion in LOV using theLOV-Suggest formor directly contact the LOV team, we are always responsive.

Engage in the public conversation around vocabularies by joining the Google+ community.

Build cool applications on top of LOV, using the LOV API.

LOV Features

Vocabulary Documentation

The vocabulary collection is maintained by the LOV team of curators in charge of validating and inserting vocabularies in the LOV data base and assigning them a detailed review (updated on a yearly basis). Before a vocabulary is inserted, LOV team contacts the authors to make sure the vocabulary is published following the best practices and meets quality requirements of the overall LOV ecosystem. When some metadata failed to be extracted automatically (such as creators of a vocabulary), curators try to add them manually by harvesting information from the documentation or from direct communication with the publisher. Once included, an automatic script checks for vocabulary updates on a daily basis. The documentation assists any user in the task of understanding the semantics of each vocabulary term and therefore of any data using it. For instance, information about the creator and publisher is a key indication in case help or clarification is required, or to assess the stability of that artifact. About 55% of vocabularies specify at least one creator, contributor or editor. We augment this information using manually gathered information, leading to inclusion of data about the creator in over 85% of vocabularies in LOV. The database stores each version of a vocabulary over time since its first available issue. For each version, LOV stores a file backup on its server, even if the original files are no longer available from their original source. To embrace the complexity of the vocabulary ecosystem and assess the impact of a modification, one needs to know in which vocabularies and datasets a particular vocabulary term is referenced. LOV provides a unique entry point to such information.
Data Access
LOV system (code and data) is published under Creative Commons 4.0 license (CC BY 4.0). Three options are offered for users and applications to access LOV data:
- download data dumps of LOV catalogue in RDF Notation 3 format or the LOV catalogue and the latest version of each vocabulary in RDF N-quads format,
- run SPARQL queries on LOV SPARQL Endpoint
- use LOV systemApplication Program Interfaces (APIs) that provides the same services as user interaces.
Vocabulary Search Engine
For every vocabulary in LOV, terms (classes, properties, datatypes, instances) are indexed and a full text search feature is offered. Compared to other existing ontology search engines, the Linked Open Vocabularies search engine ranking algorithm is not only based on term popularity in datasets but take as well into account its popularity within the LOV ecosystem and most importantly assigned a different score depending on which label property a searched term matched. We distinguish four different label property categories on which a search term could match. We will take as example the search term "person":
- local name (URI without the namespace). Whereas a URI is not suppose to carry any meaning, it is a convention to use a compressed form of a term label to construct the local name. It becomes therefore an important artifact for term matching for which the highest score will be assigned. An example of local name matching the term "person" is "http://schema.org/Person".
- primary labels. The highest score will also be assigned for matches on rdfs:label, dce:title, dcterms:title, skos:prefLabel properties. An example of primary label matching the term "person" is rdfs:label "Person"@en.
- secondary labels. We define as secondary label properties: rdfs:comment, dce:description,dcterms:descriptionand skos:altLabel. A medium score is assigned for matches on these properties. An example of secondary label matching the term "person" is dcterms:description "Examples of a Creator include a person, an organization, or a service."@en.
- tertiary labels . Finally all properties not falling in the previous categories are considered as tertiary labels for which a low score is assigned. An example of tertiary label matching the term "person" is http://metadataregistry.org/uri/profile/RegAp/name "Person"@en.
As a result a term matching a value for the property rdfs:label will have a higher score than if it matches a value for the property dcterms:comment. Based on the different nature of these labels, we apply different indexing tokenizers and scoring methods.

Application Ecosystem

Applications using LOV

OntoMaton facilitates ontology search and tagging functionalities within Google Spreadsheets. It has been developed by the ISA Team at the University of Oxford's e-Research Centre.Read more...

YASGUI is a feature-packed user-friendly interface to access any SPARQL endpoint, both on your local computer as well as remote ones. YASGUI provides syntax checking and highlighting, assists you by autocompleting your prefixes, properties, classes, and endpoints, allows you to bookmark queries, and much more. It has been developed by Laurens Rietveld funded by Data to Semantics.Read more...

Datalift is an original platform dedicated to the exploitation of data. In Datalift, the input data are raw data coming from multiple heterogeneous formats (databases, CSV, XML, RDF, RDFa, GML, Shapefile, ...). The output data produced are "Linked Data", they are also named semantic and interconnected data. The Datalift platform is actively involved in the Web mutation to the Linked Data. It has been developed by the Datalift organisation.Read more...

OntoWiki facilitates the visual presentation of a knowledge base as an information map, with different views on instance data. It enables intuitive authoring of semantic content, with an inline editing mode for editing RDF content, similar to WYSIWIG for text documents. It has been developed by the Agile Knowledge Engineering and Semantic Web (AKSW)research team.Read more...

RDFUnit

RDFUnit is a test driven data-debugging framework that can run automatically generated (based on a schema) and manually generated test cases against an endpoint. All test cases are executed as SPARQL queries using a pattern-based transformation approach. It has been developed by theAgile Knowledge Engineering and Semantic Web (AKSW)research team.

OOSP: Online Ontology Set Picker

Online Ontology Set Picker (OOSP) allows to select, from major repositories, ontologies that satisfy a user-defined sets of metrics. Its main purpose is allowing ontological tool designers to rapidly build custom benchmarks on which they could test different features. It could also serve for usage studies of different ontology language constructs and for pattern spotting. It has been developed by theDepartment of Knowledge and Information Engineering (Prague's University of Economics)research team.
Useful Related Applications

Vapour is a validation service to check whether semantic web data is correctly published according to the current best practices, as defined by the Linked Data principles . It has been developed by Fundación CTIC. Read more...

RDF Triple-Checker

RDF Triple-Checker helps you find typos and common errosrs in RDF data and in RDF Vocabulary. It has been developed by Christopher Gutteridgemember of the Southampton ECS Web Team.

LOV4IoT

LOV4IoT is a catalog of vocabularies dedicated to the Internet of Things. It is meant similarly to LOV for domain knowledge experts to reuse and foster interoperability. It has been developed byAmelie Gyrardas part of theM3 framework.

OOPS! (OntOlogy Pitfall Scanner!) helps you to detect some of the most common pitfalls appearing when developing ontologies. It has been developed by María Povedamember of the Ontology Engineering Group (OEG). Read more...