LSQ

The Linked SPARQL Queries Dataset

View project on GitHub

LSQ

A Linked Dataset describing SPARQL queries extracted from the logs of a variety of prominent public SPARQL endpoints. We argue that this dataset has a variety of uses for the SPARQL research community, be it, for example, to generate benchmarks on-the-fly by selecting real-world queries with specific characteristics that we describe, or to conduct analysis of what SPARQL (1.1) query features are most often used to interrogate endpoints, or to characterize the behavior of the different types of agents that are using these endpoints, or to find out what queries agents are asking about a given resource, etc.

Full PDF Version of LSQ

A full version (explaining the complete details) of the LSQ can be downloaded from here

SPARQL Endpoint

LSQ SPARQL endpoint is available online. Currently, we have RDFized query logs of DBpedia3.5.1, Linked Geo Data, Semantic Web Dog Food, and British Museum. If you want to query for a particular query log you can specify the following named graphs.

A list of example queries can be found at the bottom of this page.

Datadumps

The LSQ DBpedia , Linked Geo Data ,Semantic Web Dog Food , and British Museum datadumps are available online in turtle format.

Raw Queries Logs

The LSQ DBpedia , Linked Geo Data , Semantic Web Dog Food , and British Museum raw queries logs are available online.

Do You Want Your Own Local LSQ SPARQL Endpoint?

You can download the Virtuoso7.1 endpoints both for Windows and Linux. Start-up information is given below.

For Windows go to bin folder and click on start.bat file. 
A virtuoso instance of the LSQ will be started at http://localhost:8890/sparql. 

For Linux go to bin folder and execute
$ ./start_virtuoso.sh

Datahub Entry

LSQ is available at datahub as well.

Void Statistics

Our SPARQL endpoint directly provides the Void statistics for each of the query log. Below are the named graphs for Void. You may directly clicke on the named graphs to see the Void statistics on-th-fly.

An example query to get DBpedia Void is given below

CONSTRUCT {?s ?p ?o} FROM <http://dbpedia.org/void>
WHERE
{
?s ?p ?o
}
OR
SELECT * FROM <http://dbpedia.org/void>
WHERE {?s ?p ?o}

The output of the SELECT query can be seen here

Schema Diagram

LSQ vocabulary can be downloaded from here.

LSQL Schema Diagram

Source Code

LSQ is open source. You can checkout the source code from LSQ Github. You want to RDFize your own query log, you may use the following.

Package: org.aksw.simba.dataset.lsq

Class: LogRDFizer

Use Cases

We wish to provide some concrete queries (relevant to the use cases discussed in paper) that can be issued against the LSQ SPARQL endpoint

UC1 Facilitating Benchmark Generation

Linked SQ can help users generate custom benchmarks by selecting real-world queries meeting certain criteria. The query given below is an example SPARQL query over LSQ that provides a list of 50 queries with additional parameters set for both structural and data-driven criteria useful for creating custom benchmarks.

PREFIX lsqv: <http://lsq.aksw.org/vocab#>
PREFIX sp: <http://spinrdf.org/sp#>
SELECT ?query  FROM <http://data.semanticweb.org>
 WHERE {
    ?id sp:text ?query ; lsqv:resultSize ?rs ; lsqv:triplePatterns ?tp ;
        lsqv:runTimeMs ?rt ; lsqv:usesFeature lsqv:Filter . 
   FILTER (?rs > 10 && ?tp <5  && ?rt < 50 ) }
LIMIT 50

UC2 SPARQL Adoption

The Linked SQ dataset can also be used to gain insights into how the SPARQL query language is being used in practice, be that to find out how features are used and combined or to see, for example, what kinds of joins are most common. Below are the two example queries.

----The number of queries using both UNION and FILTER----
PREFIX lsqv: <http://lsq.aksw.org/vocab#>
SELECT (COUNT(?queryId) AS ?unionFilterCount)
WHERE {  ?queryId  lsqv:usesFeature lsqv:Union , lsqv:Filter . }

----The number of empty-result queries with path joins----
PREFIX lsqv: <http://lsq.aksw.org/vocab#>
SELECT (COUNT(?id) AS ?starQueries)
WHERE {
  ?id lsqv:joinVertex  ?joinVertex ; lsqv:resultSize 0 . 
  ?joinVertex lsqv:joinVertexType lsqv:Path . } 

UC3 Caching

The Linked SQ dataset can also be used to find useful patterns to cache, commonly repeated queries, or to create realistic caching benchmarks using the timestamp of execution times. Query given below gives an example of an LSQ query that finds the most frequently executed queries that take a long time to compute but have small result sizes that can be cheaply cached.

PREFIX lsqv:<http://lsq.aksw.org/vocab#>
PREFIX sp:<http://spinrdf.org/sp#>
SELECT DISTINCT ?query (COUNT(?exs) AS ?exsCount)
 WHERE {
    ?id sp:text ?query ; lsqv:resultSize ?rs ; lsqv:execution ?exs ; lsqv:runTimeMs ?rt . 
    FILTER (?rs < 100 && ?rt > 10000)}
  GROUP BY ?query ORDER BY DESC(COUNT(?exsCount))

UC4 Usability

From the Linked SQ Dataset, one can derive a list of queries that resulted in parse errors, runtime errors, or empty results. One can also look at which agents issued such queries, and how their queries evolved over time. Query given below gives a small example of a query looking for parse errors encountered by a given agent, ordered by time.

PREFIX lsqv: <http://lsq.aksw.org/vocab#>
PREFIX lsqr: <http://lsq.aksw.org/res/>
PREFIX sp: <http://spinrdf.org/sp#>
PREFIX dct: <http://purl.org/dc/terms/>  
SELECT ?query ?time ?error
 WHERE {
    ?id sp:text ?query ; lsqv:parseError ?error ; lsqv:execution ?ex . 
    ?ex dct:issued ?time ; 
        lsqv:agent lsqr:A-WlFJE0QQRlhBVRNGRx1QGVdaRhNsN2YUW15R .
 }
ORDER BY ?time

UC5 Optimisation

Given a particular workload of queries, an optimiser can decide how to configure indexes, etc., to improve the performance of typical queries. Administrators can use LSQ to derive some default statistics for what is most common across different databases. For example, the query given below provides a query to see how frequently queries containing paths return zero results, which may motivate optimisations to pre-filter empty paths; one could consider a similar example to find path queries that take the longest time, which may suggest to materialise indexes for specific paths.

PREFIX lsqv: <http://lsq.aksw.org/vocab#>
SELECT (COUNT(?id) AS ?starQueries)
WHERE {
  ?id lsqv:joinVertex  ?joinVertex ; lsqv:resultSize 0 . 
  ?joinVertex lsqv:joinVertexType lsqv:Path .
    } 

UC6 Meta-querying

The final example query given below shows how one can find all the queries relating to a given resource, in this case Michael Jackson.

PREFIX sp:<http://spinrdf.org/sp#>
PREFIX lsqv: <http://lsq.aksw.org/vocab#>
SELECT  DISTINCT  ?query
 WHERE {
    ?id sp:text ?query . 
    { ?id lsqv:mentionsSubject <http://dbpedia.org/ontology/Michael_Jackson> }
    UNION
    { ?id lsqv:mentionsObject <http://dbpedia.org/ontology/Michael_Jackson> } 
 }

Some Other Examples

Here are few of the SPARQL queries that might be interesting

---How an agent tried to correct queries with parse errors?---
PREFIX lsqv:<http://lsq.aksw.org/vocab#>
PREFIX lsqr:<http://lsq.aksw.org/res/>
PREFIX sp:<http://spinrdf.org/sp#>
PREFIX dct:<http://purl.org/dc/terms/>  
SELECT   ?query ?time 
 WHERE {
    ?id sp:text ?query .
    ?id lsqv:parseError ?error .
    ?id lsqv:execution ?executions. 
    ?executions dct:issued  ?time .
    ?executions lsqv:agent lsqr:A-WlFJE0QQRlhBVRNGRx1QGVdaRhNsN2YUW15R
}

---Get average of the different SPARQL query features from DBpedia query log---
PREFIX lsqv:<http://lsq.aksw.org/vocab#>
PREFIX sp:<http://spinrdf.org/sp#> 
SELECT (AVG(?resultSize) AS ?resultSizeAvg) (AVG(?bgps) AS ?bgpsAvg)
(AVG(?triplePatterns) AS ?triplePatternsAvg) (AVG(?joinVertices) AS ?joinVerticesAvg)
(AVG(?meanJoinVerticesDegree) AS ?meanJoinVerticesDegreeAvg)
(AVG(?meanTriplePatternSelectivity) AS ?meanTriplePatternSelectivityAvg)
(AVG(?runTime) AS ?runTimeAvg)
FROM <http://dbpedia.org>
 WHERE {
    ?id sp:text ?query .
    ?id lsqv:resultSize ?resultSize .
    ?id lsqv:bgps ?bgps.
    ?id lsqv:triplePatterns ?triplePatterns .
    ?id lsqv:joinVertices ?joinVertices .
    ?id lsqv:meanJoinVerticesDegree   ?meanJoinVerticesDegree .
    ?id lsqv:meanTriplePatternSelectivity ?meanTriplePatternSelectivity .
    ?id lsqv:runTimeMs ?runTime .
}

---Top queries by number of executions---
PREFIX lsqv:<http://lsq.aksw.org/vocab#>
PREFIX sp:<http://spinrdf.org/sp#>
SELECT DISTINCT ?id (COUNT(?executions) AS ?executionsCount)
 WHERE {
    ?id sp:text ?query .
    ?id lsqv:execution ?executions.   
}
  GROUP BY ?id
  ORDER BY DESC(COUNT(?executions))

---Top agents by number of execution---
PREFIX lsqv:<http://lsq.aksw.org/vocab#>
PREFIX sp:<http://spinrdf.org/sp#>
SELECT DISTINCT ?agent (COUNT(?executions) AS ?executionsCount)
 WHERE {
    ?id sp:text ?query .
    ?id lsqv:execution ?executions .
    ?executions lsqv:agent  ?agent.  
     }
  GROUP BY ?agent
  ORDER BY DESC(COUNT(?executions))

Sample RDF Representation

A sample RDF representation of a query is given below.

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix lsqr: <http://lsq.aksw.org/res/> . 
@prefix lsqrd: <http://lsq.aksw.org/res/SWDF-> . 
@prefix lsqv: <http://lsq.aksw.org/vocab#> . 
@prefix sp: <http://spinrdf.org/sp#> . 
@prefix dct: <http://purl.org/dc/terms/> . 
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix swdf: <http://data.semanticweb.org/ns/swc/ontology#> .

# QUERY INSTANCE META-DATA
lsqrd:q483 lsqv:endpoint <http://data.semanticweb.org/sparql> ; 
  sp:text """SELECT DISTINCT ?prop 
  WHERE { 
         ?obj rdf:type swdf:SessionEvent .    
         ?obj ?prop ?targetObj .    
         FILTER (isLiteral(?targetObj)) } 
         LIMIT 150""" . 

# STRUCTURAL META-DATA         
lsqrd:q483 lsqv:bgps 1 ; lsqv:triplePatterns 2 ; lsqv:joinVertices 1 ; 
  lsqv:meanJoinVerticesDegree 2.0 ;  
  lsqv:usesFeature  lsqv:Filter , lsqv:Distinct , lsqv:Limit ;       
  lsqv:mentionsSubject "?obj" ;
  lsqv:mentionsPredicate "?prop" , rdf:type ;
  lsqv:mentionsObject "?targetObj" , swdf:SessionEvent ; 
  lsqv:joinVertex lsqr:q483-obj . 
lsqr:q483-obj lsqv:joinVertexDegree 2 ; lsqv:joinVertexType lsqv:Star .

# DATA-SENSITIVE META-DATA 
lsqrd:q483 lsqv:resultSize 16 ;  lsqv:runTimeMs 6 ;
  lsqv:meanTriplePatternSelectivity 0.5007155695730322 .

# QUERY EXECUTION META-DATA
lsqrd:q483 lsqv:execution lsqrd:q483-e1 , lsqrd:q483-e2 , lsqrd:q483-e3 , lsqrd:q483-e4 . 
lsqrd:q483-e1 lsqv:agent lsqr:A-WlxKE0QQRlhCUBdGRx1QGVRbQRNsN2YUWF5W  ; 
  dct:issued "2014-05-22T17:08:17+01:00"^^xsd:dateTimeStamp . 
lsqrd:q483-e2 lsqv:agent lsqr:A-WlxKE0QQRlhCUBdGRx1QGVRdRBNsN2YUW1pS  ; 
  dct:issued "2014-05-20T14:34:35+01:00"^^xsd:dateTimeStamp . 
lsqrd:q483-e3 lsqv:agent lsqr:A-WlxKE0QQRlhCUBdGRx1QGVRdRBNsN2YUW1pS  ; 
  dct:issued "2014-05-20T14:28:37+01:00"^^xsd:dateTimeStamp . 
lsqrd:q483-e4 lsqv:agent lsqr:A-WlxKE0QQRlhCUBdGRx1QGVRdRBNsN2YUW1pS  ; 
  dct:issued "2014-05-20T14:24:13+01:00"^^xsd:dateTimeStamp . 

# SPIN REPRESENTATION 
lsqrd:q483   a sp:Select ;
  sp:distinct true ; sp:limit "150"^^xsd:long ;
  sp:resultVariables ( [ sp:varName  "prop"^^xsd:string ] ) ;
  sp:where ( 
     [ sp:subject [ sp:varName  "obj"^^xsd:string ] ;
       sp:predicate  rdf:type ;
       sp:object <http://data.semanticweb.org/ns/swc/ontology#SessionEvent>
     ]  
     [ sp:subject    [ sp:varName  "obj"^^xsd:string ] ;
       sp:predicate  [ sp:varName  "prop"^^xsd:string ] ;
       sp:object  [ sp:varName  "targetObj"^^xsd:string ] 
     ] 
     [ a sp:Filter ;
       sp:expression  [ a sp:isLiteral ; sp:arg1 [ sp:varName  "targetObj"^^xsd:string ] ]
     ]
 ) .

Issue Tracker and Mailing List

If you notice any issue, you may use LSQ issue tracker. Alternatively, you can contact any of the LSQ team member. We would be more than happy to reply as soon as possible. You may also use LSQ Google mailing list for further discussion and suggestions.

Query Logs Results

Our complete query logs results (statistics about queries) can be downloaded here. Please refers to the LSQ paper for details.

How Can I Add My Queries?

If you have a query log, you may contact the maintainer of LSQ. In future, we are planning to provide an API for on-the-fly queries addition to our public endpoint. We are currently RDFizing Strabon and BioPortal queries logs and will be available soon. Stay tuned!.

LSQ Team

We are very thankful to Richard Cyganiak (TopQuadrant), Jens Lehmann (AKSW, Uni. Leipzig), Dimitris Kontokostas (AKSW, Uni. Leipzig), Ivan Ermilov (AKSW, Uni. Leipzig), and Hugh Glaser (Ethos VO Ltd) for providing query logs.