Markup your content! Microdata & schema.org
Schema.org vocabulary and markup language allows to properly annotate our content and thus be ready for this Age of Big Data and AI assistants.
Let’s start from the very beginning ...
A brief history of the Web
Web 1.0: The Web as a source of information (1993+)
Back in 1993 the Web was intended as a system of interlinked documents accessed via the Internet, a huge network of computers. It became the largest source of information ever. And that was the initial concept of the Web: a big source of information.
Web 2.0: The Read-Write Web (2002+)
But then, authors started to use Ajax and people started to share photos and videos, interact in social networks, publish content in blogs and contribute to wikis. The Web is not a read-only place anymore but a “living” thing that is contributed to and modified by its visitors.
Web 3.0: The Semantic Web (2011+)
We are building towards the Web as a place of information not just for humans but for robots and crawlers alike. This way, they can make use of all this information to provide improved analysis and better quality results for requests.
This is done through marking up meaningful pieces of content in a standardized way.
The Semantic way
The path towards a semantic web started with the technical means available. In our case, this was HTML. This markup language was created to describe the structure of the information (syntax) on a page and not the meaning (semantics) of contents.
Then HTML5 introduced new semantic elements to better describe content: header, article, aside, section, nav, footer, … This allowed greater expressiveness for page authors and opened new possibilities for data processed services:
- Browser plugins can more easily pull out the text of the article for a cleaner reading experience.
- Search engines can give more weight to article content rather than the advertising in the sidebar.
- Screen reader software can use the structural elements such as nav to make textual content more accessible to people with disabilities.
While these new elements provide extremely useful extra information about the sections of content, they do not really describe what the HTML document is about.
The answer to this need is really what the Semantic Web is all about: to give a better understanding of the meaning behind information to computers, through enhanced markup.
The idea is to enhance the markup with additional data so a computer can make sense of the web pages more easily. This additional data should be structured in a standardized way so that it can be accessed in an easy and universal way. For that we need to define a syntax (how to write information) and a vocabulary (standard words used to identify things).
Schema.org was the vocabulary proposed in 2011 by major search browsers. Microdata & JSON-LD are the best performing syntaxes for this vocabulary.
We cannot approach the information as a whole. Instead, we must split it into meaningful “pieces” or things (people, places, events, companies, products, movies, etc.) and their relationships.
Wondering how the enhanced markup looks like? Let’s have a look at a simple example so we can get an idea.
HTML without markup.
<p>Christopher Froome was sponsored by Sky in the Tour de France.</p>
In this case, we are going to identify 2 things (Person, Organization) and their relationship (sponsor) as well as some properties (url)
We will present the example in the schema.org vocabulary in use with the 2 most extended syntaxes:
HTML with microdata & schema.org schema markup
<p itemscope itemprop="Person" itemtype="http://schema.org/Person">
<span itemprop="name">Christopher Froome</span> was sponsored by
<span itemprop="sponsor" itemtype="http://schema.org/Organization">
<a itemprop="url" href="http://www.skysports.com/">Sky\</span> in the Tour de France.
HTML with JSON-LD & schema.org schema markup
"name": "Christopher Froome",
Including structured data markup in web content helps Google and other web browsers to better index and understand the content. Some data can also be used to create and display Rich Snippets within the search results.
Rich snippets don’t have a direct effect on rankings, but they highly influence click-through rate.
Google Knowledge Graph
Local business and postal address schema tags clearly identify a business and its location for features like: the Knowledge Graph or cards in Google Now and Google Glass.
Some email clients support Schema.org-based structured data markup.
In the case of Gmail, this means that the emails can display quick action buttons that let users take actions directly from their inboxes.
This approach makes it possible for e-mail assistant tools to extract the structured data and make it available through mobile notifications, maps, calendars, etc.
For example, Microsoft's Cortana (for Windows 10 and Windows phones) pulls flight information from structured data in e-mail messages to render a flight information card.
iOS Searchlight/Siri uses Schema.org for search features including aggregate ratings, offers, products, prices, interaction counts, organizations, images, phone numbers, and potential website search actions. Probably other AI Assistants (will) use structured data websites to pull information for better results: Amazon Echo, Microsoft Cortana, Google Now.
- Major sites have published structured data
- Today, Schema.org is used by more than 25% of the pages found in the major search indices.
- Better marked up content means more relevant on search results, thus improvements in reaching your target audience.
- Structured data is/will be the best way to optimize for the AI’s search engine revolution.
“The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”
Tim Berners-Lee, 2001
Inventor of the Web