Putting the Web first

Ruben Verborgh

Ghent University – imec

It's 1999, you're into CS and have to decide whether you'll change the world through AI or Semantic Web. Choose well, there's 1 right choice
William Vambenepe (@vambenepe) September 23, 2016

@vambenepe One is about centralized intelligence for the happy few. The other is about decentralized intelligence for all. Tough choice ;-)
Ruben Verborgh (@RubenVerborgh) September 24, 2016

Web is the differentiating factor
for the Semantic Web.

If things don't work on the Web,
we have a serious problem.

Putting the Web first

When the Semantic Web is not Web
Designing for intelligent clients
Measuring Web-scale solutions

Where are the Semantic Web
app developers?

Which apps have powerful reasoning capabilities?
- Yet reasoners produce excellent results in our papers!
Which apps query Linked Data from the live Web?
- Yet querying runs well on our local machines!
Which apps use Linked Data from multiple sources?
- Yet federation works fine in our university basements!

My research focuses on bringing
the Semantic Web back to the Web.

It's not simply a matter of engineering.
- Just because it's fast, doesn't mean it scales.
It's a matter of redesigning.
- Putting scale before speed.
It's a matter of measuring.
- We should measure and compare multiple dimensions.

Design things that work
like the rest of the Web.

Do you know any Web developer
who gives public access to a MySQL database?
- Of course not!
- They build Web APIs to restrict queries.
Then why would the same thing suddenly
work with a public RDF database?
- It's the Semagic Web!

Public SPARQL endpoints do not scale,
and never will.

More than half of all public SPARQL endpoints
have an uptime of ≤ 95%.
- The rest of the Web measures uptime in number of nines.
The average endpoint is down
for more than 1.5 days each month.
- We cannot build reliable applications on top of this.
This problem is inherent to such complex interfaces.
- Engineering and faster endpoints cannot fix it.

The true potential of Linked Data
lies in connecting datasets.

Problems get progressively worse
if we want to query multiple datasets.
- 1 dataset: 95%
- 2 datasets: 90%
- 5 datasets: 77%
Yet this is our main differentiator!
Back to hosting our own private endpoints?
- Goodbye Semantic Web…

Putting the Web first

When the Semantic Web is not Web
Designing for intelligent clients
Measuring Web-scale solutions

Linked Data publishing so far
has been a story of two extremes.

Possible Linked Data interfaces exist
in between those two extremes.

Linked Data Fragments is a uniform view
on Linked Data interfaces.

Every Linked Data interface
publishes specific fragments
of a Linked Data set.

We designed a new trade-off mix
with low cost and high availability.

A Triple Pattern Fragments interface
is low-cost and enables clients to query.

This Triple Pattern Fragment shows
subjects born in Paris from DBpedia.

SPARQL queries are executed by clients,
by splitting them in supported fragments.

SELECT  ?person  ?name  WHERE {
    ?person  rdfs:label  ?name;
             rdf:type  dbpedia-owl:Artist;
             dbpedia-owl:birthPlace  ?city.
    ?city  rdfs:label  "Paris"@en.
}
LIMIT 100

Datasource: http://fragments.dbpedia.org/2016-04/en

This browser client evaluates
a complex query with fragments.

Evaluating queries over federations means
asking multiple servers for fragments.

Putting the Web first

When the Semantic Web is not Web
Designing for intelligent clients
Measuring Web-scale solutions

@RubenVerborgh "1.000 results in 55.0s" That's why nobody wants federated search.
(((Jörg Prante))) (@xbib) August 19, 2016

.@xbib @claussni I’d argue it’s fantastic! Don’t tell me you could read those 1.000 results in just 55s ;-) And 1st 100 arrived much faster!
Ruben Verborgh (@RubenVerborgh) August 19, 2016

There's no silver bullet.
There's no single metric.

Most query papers focus on execution time.
- Execution can always be made faster with centralization.
- What if that time cannot be achieved on the public Web?
To build a Semantic Web,
we need to ask the important questions.
- How does it scale?
- What does it cost to scale?
- Do the most important results arrive sooner?

We compared Triple Pattern Fragments
against SPARQL endpoints.

We ran the Berlin SPARQL benchmark with:

1–244 simultaneous clients
1 cache
1 server

The query throughput is lower,
but resilient to high client numbers.

The server traffic is higher,
but individual requests are lighter.

Caching is significantly more effective,
as clients reuse fragments for queries.

The server requires much less CPU,
allowing higher availability at lower cost.

Federation is the killer use case
for Linked Data on the Web.

Linked Data excels at data integration.
Do our results extend to federation as well?
We tested this with FedBench:
- state-of-the-art federation with 9 SPARQL endpoints
- TPF client federation with 9 TPF interfaces

In federated scenarios, light interfaces
can achieve fast query times as well.

Putting the Web first

When the Semantic Web is not Web
Designing for intelligent clients
Measuring Web-scale solutions

Triple patterns are not the final answer.
No interface ever will be.

Publication and querying always involve trade-offs.
- execution time
- bandwidth
- server cost
- client cost
- …
Triple Pattern Fragments demonstrate how far
we get with simple servers and smart clients.

If we want to see intelligent clients,
we should stop building intelligent servers.

Server-side intelligence doesn't scale.
Client-side intelligence is the real challenge.
Servers should enable clients to act intelligently.

I challenge you to explore the axis
to find and measure other trade-offs.

Does it work
on the Web?

Putting the Web first

@RubenVerborgh

Ghent University – imec

Putting the Web first

Putting the Web first

Putting the Web first

Where are the Semantic Web app developers?

My research focuses on bringing the Semantic Web back to the Web.

Design things that work like the rest of the Web.

Public SPARQL endpoints do not scale, and never will.

The true potential of Linked Data lies in connecting datasets.

Putting the Web first

Linked Data publishing so far has been a story of two extremes.

Possible Linked Data interfaces exist in between those two extremes.

Linked Data Fragments is a uniform view on Linked Data interfaces.

We designed a new trade-off mix with low cost and high availability.

A Triple Pattern Fragments interface is low-cost and enables clients to query.

This Triple Pattern Fragment shows subjects born in Paris from DBpedia.

SPARQL queries are executed by clients, by splitting them in supported fragments.

This browser client evaluates a complex query with fragments.

Evaluating queries over federations means asking multiple servers for fragments.

Putting the Web first

There's no silver bullet. There's no single metric.

We compared Triple Pattern Fragments against SPARQL endpoints.

The query throughput is lower, but resilient to high client numbers.

The server traffic is higher, but individual requests are lighter.

Caching is significantly more effective, as clients reuse fragments for queries.

The server requires much less CPU, allowing higher availability at lower cost.

Federation is the killer use case for Linked Data on the Web.

In federated scenarios, light interfaces can achieve fast query times as well.

Putting the Web first

Triple patterns are not the final answer. No interface ever will be.

If we want to see intelligent clients, we should stop building intelligent servers.

I challenge you to explore the axis to find and measure other trade-offs.

Putting the Web first

Where are the Semantic Web
app developers?

My research focuses on bringing
the Semantic Web back to the Web.

Design things that work
like the rest of the Web.

Public SPARQL endpoints do not scale,
and never will.

The true potential of Linked Data
lies in connecting datasets.

Linked Data publishing so far
has been a story of two extremes.

Possible Linked Data interfaces exist
in between those two extremes.

Linked Data Fragments is a uniform view
on Linked Data interfaces.

We designed a new trade-off mix
with low cost and high availability.

A Triple Pattern Fragments interface
is low-cost and enables clients to query.

This Triple Pattern Fragment shows
subjects born in Paris from DBpedia.

SPARQL queries are executed by clients,
by splitting them in supported fragments.

This browser client evaluates
a complex query with fragments.

Evaluating queries over federations means
asking multiple servers for fragments.

There's no silver bullet.
There's no single metric.

We compared Triple Pattern Fragments
against SPARQL endpoints.

The query throughput is lower,
but resilient to high client numbers.

The server traffic is higher,
but individual requests are lighter.

Caching is significantly more effective,
as clients reuse fragments for queries.

The server requires much less CPU,
allowing higher availability at lower cost.

Federation is the killer use case
for Linked Data on the Web.

In federated scenarios, light interfaces
can achieve fast query times as well.

Triple patterns are not the final answer.
No interface ever will be.

If we want to see intelligent clients,
we should stop building intelligent servers.

I challenge you to explore the axis
to find and measure other trade-offs.