Difference between revisions of "Pysotsog"

From BITPlan Wiki
Jump to navigation Jump to search
 
(39 intermediate revisions by the same user not shown)
Line 5: Line 5:
 
|title=pysotsog is a python library for scholars to help navigate the conceptual knowledge graph consisting of authors,organizations,papers,scientific events,scientific event series
 
|title=pysotsog is a python library for scholars to help navigate the conceptual knowledge graph consisting of authors,organizations,papers,scientific events,scientific event series
 
|url=https://github.com/WolfgangFahl/pysotsog
 
|url=https://github.com/WolfgangFahl/pysotsog
|version=0.0.5
+
|version=0.1.0
|date=2022-11-16
+
|date=2022-11-23
|since=2022-11-16
+
|since=2022-12-04
 
|storemode=property
 
|storemode=property
 
}}
 
}}
 +
= Motivation =
 +
[https://en.wikipedia.org/wiki/Standing_on_the_shoulders_of_giants Standing on the shoulders of giants] is a core motto for scholars when doing research.
 +
To pursue this motto scholars need to be able to navigate the conceptual knowledge graph depicted in the diagram below.
 +
This knowledge graph is implemented in [https://www.wikidata.org/wiki/Wikidata:Main_Page Wikidata],[https://dblp.org/ dblp], library catalogs such as [https://www.tib.eu/de/ TIB] and the general internet.
 +
Quite a few items for the relevant entities are accessible via the [https://scholia.toolforge.org/ scholia] portal.
 +
 +
pysotsog is a python library to improve the search, navigation and general accessibility of the items in this scholary knowledge graph.
 +
 +
<graphviz format='svg'>
 +
/**
 +
* Wolfgang Fahl 2022-05-11
 +
* updated 2022-11-17
 +
*
 +
* Entities and Properties
 +
*/
 +
digraph links {
 +
  rankdir = TB;
 +
  scholar [
 +
      shape="box"
 +
      href="https://www.wikidata.org/wiki/Q5"
 +
  ]
 +
  institution [
 +
      shape="box"
 +
      href="https://www.wikidata.org/wiki/Q4671277"
 +
  ]
 +
  paper [
 +
      shape="box"
 +
      href="https://www.wikidata.org/wiki/Q13442814"
 +
  ]
 +
  proceedings [
 +
      color="blue"
 +
      fontcolor="blue"
 +
      shape="box"
 +
      href="https://www.wikidata.org/wiki/Q1143604"
 +
  ]
 +
  paper -> scholar [
 +
      label="author"
 +
      href="https://www.wikidata.org/wiki/Property:P50"
 +
  ]
 +
  scholar -> institution [
 +
      label="affiliated with"
 +
  ]   
 +
  paper -> paper [
 +
      label="cites"
 +
      href="https://www.wikidata.org/wiki/Property:P2860"
 +
  ]
 +
  paper -> proceedings [
 +
      label="published in"
 +
      href="https://www.wikidata.org/wiki/Property:P1433"
 +
  ]
 +
  eventseries [
 +
      color="blue"
 +
      fontcolor="blue"
 +
      shape="box"
 +
      href="https://www.wikidata.org/wiki/Q47258130"
 +
  ]
 +
  event [
 +
      color="blue"
 +
      fontcolor="blue"
 +
      shape="box"
 +
      href="https://www.wikidata.org/wiki/Q2020153"
 +
  ]
 +
 +
  event -> eventseries [
 +
      color="blue"
 +
      fontcolor="blue"
 +
      label="part of the series" 
 +
      href="https://www.wikidata.org/wiki/Property:P179"
 +
  ]
 +
  proceedings ->  event [
 +
      color="blue"
 +
      fontcolor="blue"
 +
      label="is proceedings of"
 +
      href="https://www.wikidata.org/wiki/Property:P4745"
 +
  ]
 +
  { rank = same; event; eventseries; }
 +
 +
}
 +
</graphviz>
 +
== Diagram ==
 +
<uml>
 +
/'pysotsog:sotsog: Standing on the shoulders of giants - with direct access to the clouds
 +
updated 2022-12-26
 +
     
 +
authors:Wolfgang Fahl
 +
'/
 +
title  pysotsog:sotsog: Standing on the shoulders of giants - with direct access to the clouds see https://wiki.bitplan.com/index.php/Pysotsog updated 2022-12-26
 +
hide circle
 +
package skg {
 +
  class Scholar {
 +
    wikiDataId
 +
    name
 +
    gndId
 +
    dblpId
 +
    orcid
 +
    linkedInId
 +
    googleScholarUser
 +
    homepage
 +
    givenName
 +
    familyName
 +
    gender
 +
    image
 +
    Semantic_Scholar_author_ID
 +
 +
  }
 +
  class Institution {
 +
    wikiDataId
 +
    short_name
 +
    inception
 +
    country
 +
    image
 +
    located_in
 +
    official_website
 +
 +
  }
 +
  class Paper {
 +
    wikiDataId
 +
    title
 +
    doi
 +
    DBLP_publication_ID
 +
    publication_date
 +
 +
  }
 +
  class Event {
 +
    wikiDataId
 +
    title
 +
    location
 +
    point_in_time
 +
    official_website
 +
 +
  }
 +
  class EventSeries {
 +
    wikiDataId
 +
    short_name
 +
    title
 +
    official_website
 +
    DBLP_venue_ID
 +
    VIAF_ID
 +
    inception
 +
    gndId
 +
 +
  }
 +
  class Proceedings {
 +
    wikiDataId
 +
    short_name
 +
    title
 +
    publication_date
 +
    full_work_available_at_URL
 +
 +
  }
 +
  class Country {
 +
    wikiDataId
 +
    name
 +
    iso_code
 +
    homepage
 +
    population
 +
    coordinate_location
 +
 +
  }
 +
}
 +
</uml>
 +
 +
= Demo =
 +
http://sotsog.bitplan.com
 +
 +
= Search strategy =
 +
sotsog searches are specialized. They will try to select results by relevance.
 +
E.g. if you search for the country "Singapore" the disambiguation will make sure that the ghost town "Singapore" in wikidata is ignored since it is not related as much to the scientific context as the Singapore city-state is.
 +
<source lang='bash' hightlight='1'>
 +
wd search Singapore
 +
Q334      Singapore city-state in maritime Southeast Asia
 +
Q3306197  Central Area, Singapore city centre of Singapore
 +
Q4420036  Singapore in the Straits Settlements period of Singapore History
 +
Q3484945  Singapore 1947 film by John Brahm
 +
Q7522845  Singapore ghost town in Michigan
 +
Q5124558  Civil Service College college for Singapore government employees
 +
Q7522857  Singapore 1980  song by 2 Plus 1
 +
Q30628723  Singapore settlement in South Africa
 +
Q110537331 Singapore ship built in 1924
 +
Q98150266  Singapore 2002 children's nonfiction book
 +
Q20470370  Singapore listed historical ship in Sweden
 +
Q30276503  Singapore preserved British 0-4-0ST locomotive
 +
Q48990479  Singapore British-bred Thoroughbred racehorse
 +
Q11893609  Singapore album by Frederik
 +
Q7522855  Singapore 1960 film directed by Shakti Samanta
 +
Q97987607  SINGAPORE Barque built in Aberdeen in 1833
 +
Q115262842 Singapore geographic township in Ontario, Canada
 +
Q84264331  Singapore ship built in 2004
 +
Q170422    2010 Summer Youth Olympics 2010 edition of the Summer Youth Olympics
 +
Q40176    Singapore MRT rapid transit system in Singapore
 +
...
 +
Q7522845  Singapore ghost town in Michigan
 +
</source>
 +
 +
The above search is first filtered by relevant classes that is the P31*/P279* relations of wikidata are considered to find items that
 +
have a base class which is part of the concept skg shown above.
 +
 +
For relevant "neighbors" of our instances the same holds true depending on the relevance calculated as a function of frequency and or value.
 +
E.g. [https://www.wikidata.org/wiki/Q334 Singapore (Q334)] is rated heigh since the frequency of scientific academic conferences in Singapore is high enough to make it relevant for the [https://www.wikidata.org/wiki/Property:P17 wdt:P17 country property] of an event [https://query.wikidata.org/#%23%20truly%20tabular%20aggregate%20query%20for%20%0A%23%20Q2020153%3Aacademic%20conference%0A%23%20generated%20by%20trulytabular.py%20version%200.4.7%20on%202022-11-20T15%3A01%3A55.471873%0APREFIX%20rdf%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F1999%2F02%2F22-rdf-syntax-ns%23%3E%0APREFIX%20rdfs%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0APREFIX%20schema%3A%20%3Chttp%3A%2F%2Fschema.org%2F%3E%0APREFIX%20wd%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0APREFIX%20wdt%3A%20%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0APREFIX%20wikibase%3A%20%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0APREFIX%20xsd%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema%23%3E%0A%0ASELECT%20%0A%20%20%28COUNT%20%28%3FcountryItem%29%20AS%20%3FcountryCount%29%0A%20%20%3Fcountry%0A%23%20%20%28COUNT%20%28DISTINCT%20%3FlocationItem%29%20AS%20%3FlocationCount%29%0A%23%20%3Flocation%0AWHERE%20%7B%0A%20%20%23%20instanceof%20Q2020153%3Aacademic%20conference%0A%20%20%3Facademic_conferenceItem%20wdt%3AP279%2a%2Fwdt%3AP31%2a%20wd%3AQ2020153.%0A%20%20%23%20label%0A%20%20%3Facademic_conferenceItem%20rdfs%3Alabel%20%3Facademic_conference.%20%20%0A%20%20FILTER%20%28LANG%28%3Facademic_conference%29%20%3D%20%22en%22%29.%0A%20%20%23%20country%20%28P17%29%0A%20%0A%20%20%20%20%3Facademic_conferenceItem%20wdt%3AP17%20%3FcountryItem.%20%0A%20%20%20%20%3FcountryItem%20rdfs%3Alabel%20%3Fcountry.%0A%20%20%20%20FILTER%20%28LANG%28%3Fcountry%29%20%3D%20%22en%22%29.%0A%20%20%23%20location%20%28P276%29%0A%20%20%23OPTIONAL%20%7B%20%0A%20%20%23%20%20%3Facademic_conferenceItem%20wdt%3AP276%20%3FlocationItem.%20%0A%20%20%23%20%20%3FlocationItem%20rdfs%3Alabel%20%3Flocation.%0A%20%20%23%20%20FILTER%20%28LANG%28%3Flocation%29%20%3D%20%22en%22%29.%0A%20%20%23%7D%0A%7D%0AGROUP%20BY%0A%20%20%3Fcountry%0A%20%20%23%3Flocation%0AORDER%20BY%20DESC%20%28%3FcountryCount%29%20%0A%23%28DESC%20%3FlocationCount%29%20 try query!].
 +
 +
To get the frequency information we scan our sources regularly e.g. using wikidata,dblp and conferencecorpus as the sources.
 +
In the first phase the relevance calculations will be focussed on scientific events since this work is part of the ConfIDent project.
 +
 +
A special case is "academic field" "topic of work" and the likes which are specifically only tracked for the most relevant items
 +
e.g. the most relevant 1000,10.000,100.000,1.000.000, 10.000,000 and so - this is a "long-tail" issue that will only be covered by counting
 +
links and keeping as much relevance information as is technically and organizationally a "low hanging fruit".
 +
Here no specific class handling is done anymore - the class info is only available by following the link (this does not mean that search by topic is not possible since of course the topic itself will be searchable and of course a reverse search "WhatLinksHere" is possible but no further structure is available directly in the sotsog code or infrastructure.
 +
 +
{{pip|pysotsog}}
 +
 +
= Open Source access =
 +
<source lang='bash'>
 +
git clone https://github.com/WolfgangFahl/pysotsog
 +
cd pysotsog
 +
pip install .
 +
</source>
 +
== Testing ==
 +
<source lang='bash' highlight='1-2'>
 +
pip install green
 +
green
 +
...
 +
Ran 14 tests in 13.228s using 4 processes
 +
 +
OK (passes=14)
 +
</source>
 +
 +
= Usage =
 +
== Command line ==
 +
<source lang='bash'>
 +
sotsog -h
 +
usage: sotsog [-h] [-d] [-la LANG] [-li LIMIT] [-V] [search ...]
 +
 +
python Library for Scholars to achieve "Standing on the shoulders of giants"
 +
 +
  Created by Wolfgang Fahl on 2022-11-16.
 +
  Copyright 2022 Wolfgang Fahl. All rights reserved.
 +
 +
  Licensed under the Apache License 2.0
 +
  http://www.apache.org/licenses/LICENSE-2.0
 +
 +
  Distributed on an "AS IS" basis without warranties
 +
  or conditions of any kind, either express or implied.
 +
 +
USAGE
 +
 +
positional arguments:
 +
  search                search terms
 +
 +
options:
 +
  -h, --help            show this help message and exit
 +
  -d, --debug          show debug info
 +
  -la LANG, --lang LANG
 +
                        language code to use
 +
  -li LIMIT, --limit LIMIT
 +
                        limit the number of search results
 +
  -V, --version        show program's version number and exit
 +
</source>
 +
=== Examples ===
 +
==== Scholar ====
 +
[https://scholia.toolforge.org/author/Q80 Tim Berners-Lee]
 +
<source lang='bash' highlight="1">
 +
sotsog Tim Berners-Lee
 +
Tim Berners-Lee(Q80):English computer scientist, inventor of the World Wide Web (born 1955)✅
 +
Scholar➜Tim Berners-Lee:
 +
  wikiDataId=http://www.wikidata.org/entity/Q80
 +
  gndId=121649091
 +
  dblpId=b/TimBernersLee
 +
  orcid=0000-0003-1279-3709
 +
  homepage=http://www.w3.org/People/Berners-Lee/
 +
  givenName=http://www.wikidata.org/entity/Q1369663
 +
  familyName=http://www.wikidata.org/entity/Q18375238
 +
  gender=http://www.wikidata.org/entity/Q6581097
 +
  image=http://commons.wikimedia.org/wiki/Special:FilePath/Sir%20Tim%20Berners-Lee%20%28cropped%29.jpg
 +
opening https://scholia.toolforge.org/author/Q80 in browser
 +
</source>
 +
==== Paper ====
 +
[https://scholia.toolforge.org/work/Q55693402 We Need a Magna Carta for the Internet]
 +
<source lang='bash' highlight="1">
 +
sotsog We Need a Magna Carta for the Internet
 +
We Need a Magna Carta for the Internet(Q55693402):✅
 +
Paper➜We Need a Magna Carta for the Internet:
 +
  wikiDataId=http://www.wikidata.org/entity/Q55693402
 +
  DOI=10.1111/NPQU.11475
 +
  publication_date=2014-07-01 00:00:00
 +
opening https://scholia.toolforge.org/work/Q55693402 in browser
 +
</source>
 +
 +
==== Institution ====
 +
[https://scholia.toolforge.org/organization/Q273263 RWTH Aachen]
 +
<source lang='bash' highlight="1">
 +
sotsog RWTH
 +
RWTH Aachen University(Q273263):university in Aachen, Germany✅
 +
Institution➜RWTH Aachen University:
 +
  wikiDataId=http://www.wikidata.org/entity/Q273263
 +
  short_name=RWTH Aachen
 +
  inception=1870-10-10 00:00:00
 +
  country=http://www.wikidata.org/entity/Q183
 +
  image=http://commons.wikimedia.org/wiki/Special:FilePath/1196-18-rwth-aachen-hg-von-hendrik-brixius.jpg
 +
  located_in=http://www.wikidata.org/entity/Q1017
 +
  official_website=http://www.rwth-aachen.de
 +
opening https://scholia.toolforge.org/organization/Q273263 in browser
 +
</source>
 +
==== Event Series ====
 +
[https://scholia.toolforge.org/event-series/Q3570023 WWW]
 +
<source lang='bash' highlight="1">
 +
sotsog WWW     
 +
The Web Conference(Q3570023):conference series✅
 +
EventSeries➜The Web Conference:
 +
  wikiDataId=http://www.wikidata.org/entity/Q3570023
 +
  short_name=WWW
 +
  title=The Web Conference
 +
  official_website=http://www.iw3c2.org/conferences/
 +
  DBLP_venue_ID=conf/www
 +
  inception=1994-01-01 00:00:00
 +
  gndId=1092529268
 +
opening https://scholia.toolforge.org/event-series/Q3570023 in browser
 +
</source>
 +
==== Event ====
 +
[https://scholia.toolforge.org/event/Q109551429 VNC 2021]
 +
<source lang='bash' highlight="1">
 +
sotsog VNC 2021
 +
2021 IEEE Vehicular Networking Conference (VNC)(Q109551429):2021 edition of VNC Conference on Vehicular Networking✅
 +
Event➜2021 IEEE Vehicular Networking Conference (VNC):
 +
  wikiDataId=http://www.wikidata.org/entity/Q109551429
 +
  title=2021 IEEE Vehicular Networking Conference (VNC)
 +
  location=http://www.wikidata.org/entity/Q3012
 +
  official_website=https://ieee-vnc.org/2021/
 +
opening https://scholia.toolforge.org/event/Q109551429 in browser
 +
</source>
 +
==== Proceedings ====
 +
[https://scholia.toolforge.org/venue/Q115118238 Proceedings of the 35th International Workshop on Description Logics (DL 2022)]
 +
<source lang='bash' highlight="1">
 +
sotsog "Proceedings of the 35th International Workshop on Description Logics (DL 2022)"
 +
Proceedings of the 35th International Workshop on Description Logics (DL 2022)(Q115118238):Proceedings of DL 2022 workshop✅
 +
Proceedings➜Proceedings of the 35th International Workshop on Description Logics (DL 2022):
 +
  wikiDataId=http://www.wikidata.org/entity/Q115118238
 +
  short_name=DL 2022
 +
  title=Proceedings of the 35th International Workshop on Description Logics (DL 2022)
 +
  publication_date=2022-11-03 00:00:00
 +
  full_work_available_at_URL=http://ceur-ws.org/Vol-3263/
 +
opening https://scholia.toolforge.org/venue/Q115118238 in browser
 +
</source>
 +
= Mediawiki extension Semantic Cite suppport =
 +
The [https://www.mediawiki.org/wiki/Extension:Semantic_Cite Semantic Cite] notation is one of the markups that sotsog supports.
 +
 +
== Example ==
 +
<source lang='bash' highlight='1'>
 +
sotsog -nb --scite Citation.js
 +
Citation.js: a format-independent, modular bibliography tool for the browser and command line(Q60565832):✅
 +
Paper ➞ Citation.js: a format-independent, modular bibliography tool for the browser and command line:
 +
  wikiDataId=http://www.wikidata.org/entity/Q60565832
 +
  doi=10.7287/PEERJ.PREPRINTS.27466V1
 +
  DBLP_publication_ID=journals/peerjpre/Willighagen19
 +
  publication_date=2019-01-05 00:00:00
 +
scite markup:
 +
citation.js: a format-independent, modular bibliography tool for the browser and command line
 +
[[CiteRef::willighagenci]]
 +
{{#scite:
 +
|reference=willighagenci
 +
|type=journal-article
 +
|title=Citation.js: a format-independent, modular bibliography tool for the browser and command line
 +
|authors=Lars G Willighagen
 +
|publisher=PeerJ
 +
|doi=10.7287/peerj.preprints.27466v1
 +
|year=
 +
|retrieved-from=https://dx.doi.org/
 +
|retrieved-on=2022-11-21
 +
}}
 +
</source>
 +
 +
== citation.js: a format-independent, modular bibliography tool for the browser and command line ==
 +
[[CiteRef::willighagenci]]
 +
{{#scite:
 +
|reference=willighagenci
 +
|type=journal-article
 +
|title=Citation.js: a format-independent, modular bibliography tool for the browser and command line
 +
|authors=Lars G Willighagen
 +
|publisher=PeerJ
 +
|doi=10.7287/peerj.preprints.27466v1
 +
|year=
 +
|retrieved-from=https://dx.doi.org/
 +
|retrieved-on=2022-11-21
 +
}}
 +
 +
= Dblp schema =
 +
 +
== dblp sparql query example ==
 +
The diagram below tells us that there is a generic property "webpage" available for the entities
 +
in dblp. The following query searches for linkedin webpages associated with dblp entries:
 +
 +
<source lang='sparql'>
 +
PREFIX dblp: <https://dblp.org/rdf/schema#>
 +
SELECT ?item ?webpage WHERE {
 +
  ?item dblp:webpage ?webpage .
 +
  FILTER REGEX(?webpage, "linkedin")
 +
}
 +
</source>
 +
[https://qlever.cs.uni-freiburg.de/dblp/4nckGG try it!]
 +
 +
== UML Class Diagram ==
 +
This UML Class Diagram has been generated from the dblp OWL schema at https://dblp.org/rdf/schema.
 +
The graphic is in SVG format - just open it in a new tab to zoom in and out
 +
<uml format='svg'>
 +
scale 0.4
 +
/'
 +
Wolfgang Fahl 2022-11-19
 +
updated 2022-11-19
 +
 
 +
dblp schema https://dblp.org/rdf/schema
 +
converted from owl to plantuml
 +
'/
 +
title  dblp schema https://dblp.org/rdf/schema converted from owl to plantuml updated 2022-11-19
 +
hide circle
 +
package foaf {
 +
  class Document {
 +
  }
 +
}
 +
package dblp {
 +
  note top of AmbiguousCreator
 +
  Ambiguous Creator
 +
  Not an actual creator, but an ambiguous proxy for an unknown number of unrelated actual creators. Associated publications do not have their true creators determined yet.
 +
  end note
 +
  class AmbiguousCreator{
 +
 +
  }
 +
  AmbiguousCreator--Creator:possibleActualCreator
 +
  Creator <|-- AmbiguousCreator
 +
  note top of Informal
 +
  Informal
 +
  An informal or other publication.
 +
  end note
 +
  class Informal{
 +
 +
  }
 +
  Publication <|-- Informal
 +
  note top of Creator
 +
  Creator
 +
  A creator of a publication.
 +
  end note
 +
  class Creator{
 +
    primaryCreatorName:string
 +
    homepage:Document
 +
    creatorNote:string
 +
    orcid:anyUri
 +
    creatorName:string
 +
    affiliation:string
 +
    awardWebpage:Document
 +
    primaryAffiliation:string
 +
    primaryHomepage:Document
 +
 +
  }
 +
  Creator--Publication:editorOf
 +
  Creator--AmbiguousCreator:proxyAmbiguousCreator
 +
  Creator--Creator:coCreatorWith
 +
  Creator--Creator:homonymousCreator
 +
  Creator--Publication:authorOf
 +
  Creator--Creator:coEditorWith
 +
  Creator--Publication:creatorOf
 +
  Creator--Creator:coAuthorWith
 +
  Entity <|-- Creator
 +
  note top of Publication
 +
  Publication
 +
  A publication.
 +
  end note
 +
  class Publication{
 +
    primarydocumentPage:Document
 +
    listedOnTocPage:Document
 +
    yearOfEvent:gYear
 +
    publishedBy:string
 +
    isbn:anyUri
 +
    publishersAddress:string
 +
    publicationNote:string
 +
    title:string
 +
    bibtexType:Entry
 +
    publishedInBookChapter:string
 +
    doi:anyUri
 +
    documentPage:Document
 +
    numberOfCreators:integer
 +
    publishedInSeries:string
 +
    publishedInSeriesVolume:string
 +
    publishedIn:string
 +
    pagination:string
 +
    yearOfPublication:gYear
 +
    monthOfPublication:string
 +
    publishedInJournalVolumeIssue:string
 +
    publishedInJournal:string
 +
    thesisAcceptedBySchool:string
 +
    publishedInJournalVolume:string
 +
    publishedInBook:string
 +
 +
  }
 +
  Publication--Creator:createdBy
 +
  Publication--Creator:authoredBy
 +
  Publication--Creator:editedBy
 +
  Publication--Signature:hasSignature
 +
  Publication--Publication:publishedAsPartOf
 +
  Entity <|-- Publication
 +
  note top of Data
 +
  Data
 +
  Research data or artifacts.
 +
  end note
 +
  class Data{
 +
 +
  }
 +
  Publication <|-- Data
 +
  note top of Person
 +
  Person
 +
  An actual person, who is a creator of a publication.
 +
  end note
 +
  class Person{
 +
 +
  }
 +
  Creator <|-- Person
 +
  note top of AuthorSignature
 +
  Author Signaure
 +
  The information that links a publication to an author.
 +
  end note
 +
  class AuthorSignature{
 +
 +
  }
 +
  Signature <|-- AuthorSignature
 +
  note top of Inproceedings
 +
  Inproceedings
 +
  A conference or workshop paper.
 +
  end note
 +
  class Inproceedings{
 +
 +
  }
 +
  Publication <|-- Inproceedings
 +
  note top of Withdrawn
 +
  Withdrawn
 +
  A withdrawn publication item.
 +
  end note
 +
  class Withdrawn{
 +
 +
  }
 +
  Publication <|-- Withdrawn
 +
  note top of EditorSignature
 +
  Editor Signaure
 +
  The information that links a publication to an editor.
 +
  end note
 +
  class EditorSignature{
 +
 +
  }
 +
  Signature <|-- EditorSignature
 +
  note top of Editorship
 +
  Editorship
 +
  An edited publication.
 +
  end note
 +
  class Editorship{
 +
 +
  }
 +
  Publication <|-- Editorship
 +
  note top of Incollection
 +
  Incollection
 +
  A part/chapter in a book or a collection.
 +
  end note
 +
  class Incollection{
 +
 +
  }
 +
  Publication <|-- Incollection
 +
  note top of Reference
 +
  Reference
 +
  A reference work entry.
 +
  end note
 +
  class Reference{
 +
 +
  }
 +
  Publication <|-- Reference
 +
  note top of Group
 +
  Group
 +
  A creator alias used by a group or consortium of persons.
 +
  end note
 +
  class Group{
 +
 +
  }
 +
  Creator <|-- Group
 +
  note top of Signature
 +
  Signature
 +
  The information that links a publication to a creator.
 +
  end note
 +
  class Signature{
 +
    signatureOrcid:anyUri
 +
    signatureDblpName:string
 +
    signatureOrdinal:integer
 +
 +
  }
 +
  Signature--Creator:signatureCreator
 +
  Signature--Publication:signaturePublication
 +
  note top of Book
 +
  Book
 +
  A book or a thesis.
 +
  end note
 +
  class Book{
 +
 +
  }
 +
  Publication <|-- Book
 +
  note top of Entity
 +
  Entity
 +
  A general, identifiable entity in dblp.
 +
  end note
 +
  class Entity{
 +
    identifier:anyUri
 +
    wikipedia:Document
 +
    archivedWebpage:Document
 +
    wikidata:anyUri
 +
    webpage:Document
 +
 +
  }
 +
  Thing <|-- Entity
 +
  note top of Article
 +
  Article
 +
  A journal article.
 +
  end note
 +
  class Article{
 +
 +
  }
 +
  Publication <|-- Article
 +
}
 +
</uml>

Latest revision as of 17:30, 26 December 2022

OsProject
edit
id  pysotsog
state  active
owner  WolfgangFahl
title  pysotsog is a python library for scholars to help navigate the conceptual knowledge graph consisting of authors,organizations,papers,scientific events,scientific event series
url  https://github.com/WolfgangFahl/pysotsog
version  0.1.0
description  
date  2022-11-23
since  2022-12-04
until  

Motivation

Standing on the shoulders of giants is a core motto for scholars when doing research. To pursue this motto scholars need to be able to navigate the conceptual knowledge graph depicted in the diagram below. This knowledge graph is implemented in Wikidata,dblp, library catalogs such as TIB and the general internet. Quite a few items for the relevant entities are accessible via the scholia portal.

pysotsog is a python library to improve the search, navigation and general accessibility of the items in this scholary knowledge graph.

Diagram

Demo

http://sotsog.bitplan.com

Search strategy

sotsog searches are specialized. They will try to select results by relevance. E.g. if you search for the country "Singapore" the disambiguation will make sure that the ghost town "Singapore" in wikidata is ignored since it is not related as much to the scientific context as the Singapore city-state is.

wd search Singapore
Q334       Singapore city-state in maritime Southeast Asia
Q3306197   Central Area, Singapore city centre of Singapore
Q4420036   Singapore in the Straits Settlements period of Singapore History
Q3484945   Singapore 1947 film by John Brahm
Q7522845   Singapore ghost town in Michigan
Q5124558   Civil Service College college for Singapore government employees
Q7522857   Singapore 1980  song by 2 Plus 1
Q30628723  Singapore settlement in South Africa
Q110537331 Singapore ship built in 1924
Q98150266  Singapore 2002 children's nonfiction book
Q20470370  Singapore listed historical ship in Sweden
Q30276503  Singapore preserved British 0-4-0ST locomotive
Q48990479  Singapore British-bred Thoroughbred racehorse
Q11893609  Singapore album by Frederik
Q7522855   Singapore 1960 film directed by Shakti Samanta
Q97987607  SINGAPORE Barque built in Aberdeen in 1833
Q115262842 Singapore geographic township in Ontario, Canada
Q84264331  Singapore ship built in 2004
Q170422    2010 Summer Youth Olympics 2010 edition of the Summer Youth Olympics
Q40176     Singapore MRT rapid transit system in Singapore
...
Q7522845   Singapore ghost town in Michigan

The above search is first filtered by relevant classes that is the P31*/P279* relations of wikidata are considered to find items that have a base class which is part of the concept skg shown above.

For relevant "neighbors" of our instances the same holds true depending on the relevance calculated as a function of frequency and or value. E.g. Singapore (Q334) is rated heigh since the frequency of scientific academic conferences in Singapore is high enough to make it relevant for the wdt:P17 country property of an event try query!.

To get the frequency information we scan our sources regularly e.g. using wikidata,dblp and conferencecorpus as the sources. In the first phase the relevance calculations will be focussed on scientific events since this work is part of the ConfIDent project.

A special case is "academic field" "topic of work" and the likes which are specifically only tracked for the most relevant items e.g. the most relevant 1000,10.000,100.000,1.000.000, 10.000,000 and so - this is a "long-tail" issue that will only be covered by counting links and keeping as much relevance information as is technically and organizationally a "low hanging fruit". Here no specific class handling is done anymore - the class info is only available by following the link (this does not mean that search by topic is not possible since of course the topic itself will be searchable and of course a reverse search "WhatLinksHere" is possible but no further structure is available directly in the sotsog code or infrastructure.


Installation

pip install pysotsog
# alternatively if your pip is not a python3 pip
pip3 install pysotsog 
# local install from source directory of pysotsog 
pip install .

upgrade

pip install pysotsog  -U
# alternatively if your pip is not a python3 pip
pip3 install pysotsog -U


Open Source access

git clone https://github.com/WolfgangFahl/pysotsog
cd pysotsog
pip install .

Testing

pip install green
green
...
Ran 14 tests in 13.228s using 4 processes

OK (passes=14)

Usage

Command line

sotsog -h
usage: sotsog [-h] [-d] [-la LANG] [-li LIMIT] [-V] [search ...]

python Library for Scholars to achieve "Standing on the shoulders of giants"

  Created by Wolfgang Fahl on 2022-11-16.
  Copyright 2022 Wolfgang Fahl. All rights reserved.

  Licensed under the Apache License 2.0
  http://www.apache.org/licenses/LICENSE-2.0

  Distributed on an "AS IS" basis without warranties
  or conditions of any kind, either express or implied.

USAGE

positional arguments:
  search                search terms

options:
  -h, --help            show this help message and exit
  -d, --debug           show debug info
  -la LANG, --lang LANG
                        language code to use
  -li LIMIT, --limit LIMIT
                        limit the number of search results
  -V, --version         show program's version number and exit

Examples

Scholar

Tim Berners-Lee

sotsog Tim Berners-Lee
Tim Berners-Lee(Q80):English computer scientist, inventor of the World Wide Web (born 1955)✅
Scholar➜Tim Berners-Lee:
  wikiDataId=http://www.wikidata.org/entity/Q80
  gndId=121649091
  dblpId=b/TimBernersLee
  orcid=0000-0003-1279-3709
  homepage=http://www.w3.org/People/Berners-Lee/
  givenName=http://www.wikidata.org/entity/Q1369663
  familyName=http://www.wikidata.org/entity/Q18375238
  gender=http://www.wikidata.org/entity/Q6581097
  image=http://commons.wikimedia.org/wiki/Special:FilePath/Sir%20Tim%20Berners-Lee%20%28cropped%29.jpg
opening https://scholia.toolforge.org/author/Q80 in browser

Paper

We Need a Magna Carta for the Internet

sotsog We Need a Magna Carta for the Internet
We Need a Magna Carta for the Internet(Q55693402):✅
Paper➜We Need a Magna Carta for the Internet:
  wikiDataId=http://www.wikidata.org/entity/Q55693402
  DOI=10.1111/NPQU.11475
  publication_date=2014-07-01 00:00:00
opening https://scholia.toolforge.org/work/Q55693402 in browser

Institution

RWTH Aachen

sotsog RWTH
RWTH Aachen University(Q273263):university in Aachen, Germany✅
Institution➜RWTH Aachen University:
  wikiDataId=http://www.wikidata.org/entity/Q273263
  short_name=RWTH Aachen
  inception=1870-10-10 00:00:00
  country=http://www.wikidata.org/entity/Q183
  image=http://commons.wikimedia.org/wiki/Special:FilePath/1196-18-rwth-aachen-hg-von-hendrik-brixius.jpg
  located_in=http://www.wikidata.org/entity/Q1017
  official_website=http://www.rwth-aachen.de
opening https://scholia.toolforge.org/organization/Q273263 in browser

Event Series

WWW

sotsog WWW       
The Web Conference(Q3570023):conference series✅
EventSeries➜The Web Conference:
  wikiDataId=http://www.wikidata.org/entity/Q3570023
  short_name=WWW
  title=The Web Conference
  official_website=http://www.iw3c2.org/conferences/
  DBLP_venue_ID=conf/www
  inception=1994-01-01 00:00:00
  gndId=1092529268
opening https://scholia.toolforge.org/event-series/Q3570023 in browser

Event

VNC 2021

sotsog VNC 2021
2021 IEEE Vehicular Networking Conference (VNC)(Q109551429):2021 edition of VNC Conference on Vehicular Networking✅
Event➜2021 IEEE Vehicular Networking Conference (VNC):
  wikiDataId=http://www.wikidata.org/entity/Q109551429
  title=2021 IEEE Vehicular Networking Conference (VNC)
  location=http://www.wikidata.org/entity/Q3012
  official_website=https://ieee-vnc.org/2021/
opening https://scholia.toolforge.org/event/Q109551429 in browser

Proceedings

Proceedings of the 35th International Workshop on Description Logics (DL 2022)

sotsog "Proceedings of the 35th International Workshop on Description Logics (DL 2022)"
Proceedings of the 35th International Workshop on Description Logics (DL 2022)(Q115118238):Proceedings of DL 2022 workshop✅
Proceedings➜Proceedings of the 35th International Workshop on Description Logics (DL 2022):
  wikiDataId=http://www.wikidata.org/entity/Q115118238
  short_name=DL 2022
  title=Proceedings of the 35th International Workshop on Description Logics (DL 2022)
  publication_date=2022-11-03 00:00:00
  full_work_available_at_URL=http://ceur-ws.org/Vol-3263/
opening https://scholia.toolforge.org/venue/Q115118238 in browser

Mediawiki extension Semantic Cite suppport

The Semantic Cite notation is one of the markups that sotsog supports.

Example

sotsog -nb --scite Citation.js
Citation.js: a format-independent, modular bibliography tool for the browser and command line(Q60565832):✅
Paper ➞ Citation.js: a format-independent, modular bibliography tool for the browser and command line:
  wikiDataId=http://www.wikidata.org/entity/Q60565832
  doi=10.7287/PEERJ.PREPRINTS.27466V1
  DBLP_publication_ID=journals/peerjpre/Willighagen19
  publication_date=2019-01-05 00:00:00
scite markup:
citation.js: a format-independent, modular bibliography tool for the browser and command line
[[CiteRef::willighagenci]]
{{#scite:
|reference=willighagenci
|type=journal-article
|title=Citation.js: a format-independent, modular bibliography tool for the browser and command line
|authors=Lars G Willighagen
|publisher=PeerJ
|doi=10.7287/peerj.preprints.27466v1
|year=
|retrieved-from=https://dx.doi.org/
|retrieved-on=2022-11-21
}}

citation.js: a format-independent, modular bibliography tool for the browser and command line

1

Dblp schema

dblp sparql query example

The diagram below tells us that there is a generic property "webpage" available for the entities in dblp. The following query searches for linkedin webpages associated with dblp entries:

PREFIX dblp: <https://dblp.org/rdf/schema#>
SELECT ?item ?webpage WHERE {
  ?item dblp:webpage ?webpage .
  FILTER REGEX(?webpage, "linkedin")
}

try it!

UML Class Diagram

This UML Class Diagram has been generated from the dblp OWL schema at https://dblp.org/rdf/schema. The graphic is in SVG format - just open it in a new tab to zoom in and out

References

  1. ^ willighagenci