The query below returns orgs related to prestigious science award winners.
It is moderately complex and returns 2365 rows that are about 2048 ?org (i.e. almost no Cartesian product).
base <http://trr.ontotext.com/resource/> prefix trr: <http://trr.ontotext.com/resource/ontology/> prefix ps: <http://www.wikidata.org/prop/statement/> prefix pq: <http://www.wikidata.org/prop/qualifier/> prefix wd: <http://www.wikidata.org/entity/> prefix wdt: <http://www.wikidata.org/prop/direct/> prefix bd: <http://www.bigdata.com/rdf#> prefix wikibase: <http://wikiba.se/ontology#> select ?orgId ?GRID ?orgLabel ?officialName ?orgDescription ?countryLabel ?locationLabel ?year ?officialWebsite ?orgURL ?identifierWD ?identifierGRID ?sourceURL ?linkWD ?linkGRID with {select distinct ?award { ?award wdt:P31/wdt:P279* wd:Q11448906. # science award ?award wdt:P444 []. # review score }} as %AWARD with {select distinct ?person { include %AWARD ?person wdt:P166 ?award. }} as %PERSON with {select distinct ?org { include %PERSON ?person wdt:P108 | # employer wdt:P436 | # member of (learned society) wdt:P69 | # educated at p:P512/pq:P69 | # academic degree / educated at p:P166/pq:P1416 # won award / affiliation. This may not be a notable award, but I can't write the correct union with include %AWARD ?org. filter not exists {?org wdt:P31/wdt:P279* wd:Q170584} # not a project }} as %ORG { include %ORG optional {?org wdt:P1448 ?officialName} optional {?org wdt:P17 ?country} optional {?org wdt:P131 ?location} # located in administrative territorial entity optional {?org wdt:P580|wdt:P571 ?date bind(year(?date) as ?year)} # inception|start date optional {?org wdt:P856 ?officialWebsite} optional {?org wdt:P2427 ?GRID} bind(strafter(str(?org),str(wd:)) as ?orgId) bind(uri(concat("organization/Wikidata/", ?orgId)) as ?orgURL) bind(uri(concat("source/Wikidata/", ?orgId)) as ?sourceURL) bind(uri(concat("identifier/Wikidata/", ?orgId)) as ?identifierWD) bind(uri(concat("identifier/GRID/", ?GRID)) as ?identifierGRID) bind(uri(concat("https://www.wikidata.org/wiki/", ?orgId)) as ?linkWD) bind(uri(concat("https://www.grid.ac/institutes/", ?GRID)) as ?linkGRID) service wikibase:label {bd:serviceParam wikibase:language "en,fr,it,de,nl"} }
I want to insert the orgs to another repo but can't use federated INSERT query because of T211107.
So I tried to use CONSTRUCT to get the data as Turtle:
base <http://trr.ontotext.com/resource/> prefix trr: <http://trr.ontotext.com/resource/ontology/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> prefix ps: <http://www.wikidata.org/prop/statement/> prefix pq: <http://www.wikidata.org/prop/qualifier/> prefix wd: <http://www.wikidata.org/entity/> prefix wdt: <http://www.wikidata.org/prop/direct/> prefix bd: <http://www.bigdata.com/rdf#> prefix wikibase: <http://wikiba.se/ontology#> construct { ?org_URL a trr:Organization; trr:name ?orgLabel; trr:altName ?officialName; trr:description ?orgDescription; trr:country ?countryLabel; trr:location ?locationLabel; trr:startDate ?YEAR; trr:webLink ?official_Website; trr:identifier ?identifier_WD, ?identifier_GRID; trr:source ?source_URL; trr:status "raw". ?source_URL a trr:Source; trr:src "Wikidata"; trr:webLink ?link_WD; trr:semanticLink ?org. ?identifier_WD a trr:Identifier; trr:type "Wikidata"; trr:id ?orgId; trr:webLink ?link_WD; trr:semanticLink ?org. ?identifier_GRID a trr:Identifier; trr:type "GRID"; trr:id ?GRID; trr:webLink ?link_GRID; trr:source ?source_URL. } #select ?orgId ?GRID ?orgLabel ?officialName ?orgDescription ?countryLabel ?locationLabel ?year ?officialWebsite ?orgURL ?identifierWD ?identifierGRID ?sourceURL ?linkWD ?linkGRID with {select distinct ?award { ?award wdt:P31/wdt:P279* wd:Q11448906. # science award ?award wdt:P444 []. # review score }} as %AWARD with {select distinct ?person { include %AWARD ?person wdt:P166 ?award. }} as %PERSON with {select distinct ?org { include %PERSON ?person wdt:P108 | # employer wdt:P436 | # member of (learned society) wdt:P69 | # educated at p:P512/pq:P69 | # academic degree / educated at p:P166/pq:P1416 # won award / affiliation. This may not be a notable award, but I can't write the correct union with include %AWARD ?org. filter not exists {?org wdt:P31/wdt:P279* wd:Q170584} # not a project }} as %ORG { include %ORG optional {?org wdt:P1448 ?officialName} optional {?org wdt:P17 ?country} optional {?org wdt:P131 ?location} # located in administrative territorial entity optional {?org wdt:P580|wdt:P571 ?date bind(year(?date) as ?year)} # inception|start date optional {?org wdt:P856 ?officialWebsite} optional {?org wdt:P2427 ?GRID} bind(strafter(str(?org),str(wd:)) as ?orgId) bind(uri(concat("organization/Wikidata/", ?orgId)) as ?orgURL) bind(uri(concat("source/Wikidata/", ?orgId)) as ?sourceURL) bind(uri(concat("identifier/Wikidata/", ?orgId)) as ?identifierWD) bind(uri(concat("identifier/GRID/", ?GRID)) as ?identifierGRID) bind(uri(concat("https://www.wikidata.org/wiki/", ?orgId)) as ?linkWD) bind(uri(concat("https://www.grid.ac/institutes/", ?GRID)) as ?linkGRID) bind(uri(?officialWebsite) as ?official_Website) bind(uri(?orgURL ) as ?org_URL ) bind(uri(?identifierWD ) as ?identifier_WD ) bind(uri(?identifierGRID ) as ?identifier_GRID ) bind(uri(?sourceURL ) as ?source_URL ) bind(uri(?linkWD ) as ?link_WD ) bind(uri(?linkGRID ) as ?link_GRID ) bind(strdt(?year,xsd:gYear) as ?YEAR) service wikibase:label {bd:serviceParam wikibase:language "en,fr,it,de,nl"} }
The WDQ UI returns only 15k triples (and I can't save Turtle, see T211177).
The WDQ endpoint https://query.wikidata.org/sparql returns format text/turtle, but again it's incomplete (only 209 trr:Organization instead of 2048).
So I'm forced to save the first query as CSV and then RDFize it locally using this tarql query:
prefix trr: <http://trr.ontotext.com/resource/ontology/> prefix xsd: <http://www.w3.org/2001/XMLSchema#> construct { ?org_URL a trr:Organization; trr:name ?orgLabel; trr:altName ?officialName; trr:description ?orgDescription; trr:country ?countryLabel; trr:location ?locationLabel; trr:startDate ?YEAR; trr:webLink ?official_Website; trr:identifier ?identifier_WD, ?identifier_GRID; trr:source ?source_URL; trr:status "raw". ?source_URL a trr:Source; trr:src "Wikidata"; trr:webLink ?link_WD; trr:semanticLink ?org. ?identifier_WD a trr:Identifier; trr:type "Wikidata"; trr:id ?orgId; trr:webLink ?link_WD; trr:semanticLink ?org. ?identifier_GRID a trr:Identifier; trr:type "GRID"; trr:id ?GRID; trr:webLink ?link_GRID; trr:source ?source_URL. } where { bind(uri(?officialWebsite) as ?official_Website) bind(uri(?orgURL ) as ?org_URL ) bind(uri(?identifierWD ) as ?identifier_WD ) bind(uri(?identifierGRID ) as ?identifier_GRID ) bind(uri(?sourceURL ) as ?source_URL ) bind(uri(?linkWD ) as ?link_WD ) bind(uri(?linkGRID ) as ?link_GRID ) bind(strdt(?year,xsd:gYear) as ?YEAR) }
The result is 42197 triples and 2049 orgs (trr:Organization).
(Let me know if you'd like the tarql and counting commands for testing)