Putting the Web first
    Ruben Verborgh
    Ghent University – imec
   
  
  
  
    
      Web is the differentiating factor
      for the Semantic Web.
    
    
      If things don't work on the Web,
      we have a serious problem.
    
   
  
  
  
    Where are the Semantic Web
 app developers?
    
      - Which apps have powerful reasoning capabilities?
        
- Yet reasoners produce excellent results in our papers!
 
       
      - Which apps query Linked Data from the live Web?
        
- Yet querying runs well on our local machines!
 
       
      - Which apps use Linked Data from multiple sources?
        
- Yet federation works fine in our university basements!
 
       
    
   
  
    My research focuses on bringing
 the Semantic Web back to the Web.
    
      - It's not simply a matter of engineering.
        
- Just because it's fast, doesn't mean it scales.
 
       
      - It's a matter of redesigning.
        
- Putting scale before speed.
 
       
      - It's a matter of measuring.
        
- We should measure and compare multiple dimensions.
 
       
    
   
  
    Design things that work
 like the rest of the Web.
    
      - Do you know any Web developer
        who gives public access to a MySQL database?
        
          - Of course not!
 
          - They build Web APIs to restrict queries.
 
        
       
      - Then why would the same thing suddenly
        work with a public RDF database?
        
       
    
   
  
    Public SPARQL endpoints do not scale,
 and never will.
    
      - More than half of all public SPARQL endpoints
        have an uptime of ≤ 95%.
        - The rest of the Web measures uptime in number of nines.
 
       
      - The average endpoint is down
 for more than 1.5 days each month.
        - We cannot build reliable applications on top of this.
 
       
      - This problem is inherent to such complex interfaces.
        
- Engineering and faster endpoints cannot fix it.
 
       
    
   
  
    The true potential of Linked Data
 lies in connecting datasets.
    
      - Problems get progressively worse
          if we want to query multiple datasets.
          
            - 1 dataset: 95%
 
            - 2 datasets: 90%
 
            - 5 datasets: 77%
 
          
       
      - Yet this is our main differentiator!
 
      - Back to hosting our own private endpoints?
        
      
 
    
   
  
  
    
      Linked Data publishing so far
      has been a story of two extremes.
    
    
  
  
    
      Possible Linked Data interfaces exist
      in between those two extremes.
    
    
  
  
    
      Linked Data Fragments is a uniform view
      on Linked Data interfaces.
    
    
    
      Every Linked Data interface
      publishes specific fragments
      of a Linked Data set.
    
   
  
    
      We designed a new trade-off mix
      with low cost and high availability.
    
    
  
  
    
      A Triple Pattern Fragments interface
      is low-cost and enables clients to query.
    
    
  
  
  
    
      SPARQL queries are executed by clients,
      by splitting them in supported fragments.
    
    SELECT  ?person  ?name  WHERE {
    ?person  rdfs:label  ?name;
             rdf:type  dbpedia-owl:Artist;
             dbpedia-owl:birthPlace  ?city.
    ?city  rdfs:label  "Paris"@en.
}
LIMIT 100
    Datasource: http://fragments.dbpedia.org/2016-04/en
   
  
  
    
      Evaluating queries over federations means
      asking multiple servers for fragments.
    
    
  
  
  
  
  
    
      There's no silver bullet.
      There's no single metric.
    
    
      - Most query papers focus on execution time.
        
          - Execution can always be made faster with centralization.
 
          - What if that time cannot be achieved on the public Web?
 
        
       
      - To build a Semantic Web,
        we need to ask the important questions.
        
          - How does it scale?
 
          - What does it cost to scale?
 
          - Do the most important results arrive sooner?
 
        
       
    
   
  
    
      We compared Triple Pattern Fragments
      against SPARQL endpoints.
    
    We ran the Berlin SPARQL benchmark with:
    
      - 1–244 simultaneous clients
 
      - 1 cache
 
      - 1 server
 
    
   
  
    
      The query throughput is lower,
      but resilient to high client numbers.
    
    
  
  
    
      The server traffic is higher,
      but individual requests are lighter.
    
    
  
  
    
      Caching is significantly more effective,
      as clients reuse fragments for queries.
    
    
  
  
    
      The server requires much less CPU,
      allowing higher availability at lower cost.
    
    
  
  
    
      Federation is the killer use case
      for Linked Data on the Web.
    
    
      - Linked Data excels at data integration.
 
      - Do our results extend to federation as well?
 
      - We tested this with FedBench:
        
          - state-of-the-art federation with 9 SPARQL endpoints
 
          - TPF client federation with 9 TPF interfaces
 
        
       
    
   
  
    
      In federated scenarios, light interfaces
      can achieve fast query times as well.
    
    
  
  
  
    
      Triple patterns are not the final answer.
      No interface ever will be.
    
    
      - 
        Publication and querying always involve trade-offs.
        
          - execution time
 
          - bandwidth
 
          - server cost
 
          - client cost
 
          - …
 
        
       
      - 
        Triple Pattern Fragments demonstrate how far
        we get with simple servers and smart clients.
       
    
   
  
    
      If we want to see intelligent clients,
      we should stop building intelligent servers.
    
    
      - Server-side intelligence doesn't scale.
 
      - Client-side intelligence is the real challenge.
 
      - Servers should enable clients to act intelligently.
 
    
   
  
    
      I challenge you to explore the axis
      to find and measure other trade-offs.