Putting the Web first
Ruben Verborgh
Ghent University – imec
Web is the differentiating factor
for the Semantic Web.
If things don't work on the Web,
we have a serious problem.
Where are the Semantic Web
app developers?
- Which apps have powerful reasoning capabilities?
- Yet reasoners produce excellent results in our papers!
- Which apps query Linked Data from the live Web?
- Yet querying runs well on our local machines!
- Which apps use Linked Data from multiple sources?
- Yet federation works fine in our university basements!
My research focuses on bringing
the Semantic Web back to the Web.
- It's not simply a matter of engineering.
- Just because it's fast, doesn't mean it scales.
- It's a matter of redesigning.
- Putting scale before speed.
- It's a matter of measuring.
- We should measure and compare multiple dimensions.
Design things that work
like the rest of the Web.
- Do you know any Web developer
who gives public access to a MySQL database?
- Of course not!
- They build Web APIs to restrict queries.
- Then why would the same thing suddenly
work with a public RDF database?
Public SPARQL endpoints do not scale,
and never will.
- More than half of all public SPARQL endpoints
have an uptime of ≤ 95%.
- The rest of the Web measures uptime in number of nines.
- The average endpoint is down
for more than 1.5 days each month.
- We cannot build reliable applications on top of this.
- This problem is inherent to such complex interfaces.
- Engineering and faster endpoints cannot fix it.
The true potential of Linked Data
lies in connecting datasets.
- Problems get progressively worse
if we want to query multiple datasets.
- 1 dataset: 95%
- 2 datasets: 90%
- 5 datasets: 77%
- Yet this is our main differentiator!
- Back to hosting our own private endpoints?
Linked Data publishing so far
has been a story of two extremes.
Possible Linked Data interfaces exist
in between those two extremes.
Linked Data Fragments is a uniform view
on Linked Data interfaces.
Every Linked Data interface
publishes specific fragments
of a Linked Data set.
We designed a new trade-off mix
with low cost and high availability.
A Triple Pattern Fragments interface
is low-cost and enables clients to query.
SPARQL queries are executed by clients,
by splitting them in supported fragments.
SELECT ?person ?name WHERE {
?person rdfs:label ?name;
rdf:type dbpedia-owl:Artist;
dbpedia-owl:birthPlace ?city.
?city rdfs:label "Paris"@en.
}
LIMIT 100
Datasource: http://fragments.dbpedia.org/2016-04/en
Evaluating queries over federations means
asking multiple servers for fragments.
There's no silver bullet.
There's no single metric.
- Most query papers focus on execution time.
- Execution can always be made faster with centralization.
- What if that time cannot be achieved on the public Web?
- To build a Semantic Web,
we need to ask the important questions.
- How does it scale?
- What does it cost to scale?
- Do the most important results arrive sooner?
We compared Triple Pattern Fragments
against SPARQL endpoints.
We ran the Berlin SPARQL benchmark with:
- 1–244 simultaneous clients
- 1 cache
- 1 server
The query throughput is lower,
but resilient to high client numbers.
The server traffic is higher,
but individual requests are lighter.
Caching is significantly more effective,
as clients reuse fragments for queries.
The server requires much less CPU,
allowing higher availability at lower cost.
Federation is the killer use case
for Linked Data on the Web.
- Linked Data excels at data integration.
- Do our results extend to federation as well?
- We tested this with FedBench:
- state-of-the-art federation with 9 SPARQL endpoints
- TPF client federation with 9 TPF interfaces
In federated scenarios, light interfaces
can achieve fast query times as well.
Triple patterns are not the final answer.
No interface ever will be.
-
Publication and querying always involve trade-offs.
- execution time
- bandwidth
- server cost
- client cost
- …
-
Triple Pattern Fragments demonstrate how far
we get with simple servers and smart clients.
If we want to see intelligent clients,
we should stop building intelligent servers.
- Server-side intelligence doesn't scale.
- Client-side intelligence is the real challenge.
- Servers should enable clients to act intelligently.
I challenge you to explore the axis
to find and measure other trade-offs.