Difference between revisions of "ConferenceCorpus"

From BITPlan Wiki
Jump to navigation Jump to search
(2 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
|title=Scientific Event Corpus
 
|title=Scientific Event Corpus
 
|url=https://github.com/WolfgangFahl/ConferenceCorpus
 
|url=https://github.com/WolfgangFahl/ConferenceCorpus
|version=0.0.10
+
|version=0.1.0
|date=2021-08-03
+
|date=2022-11-20
 
|since=2021-07-26
 
|since=2021-07-26
 
|storemode=property
 
|storemode=property
Line 81: Line 81:
 
title
 
title
 
ConfIDent  Event
 
ConfIDent  Event
2022-04-06
+
2022-05-11
[[https://projects.tib.eu/en/confident/ © 2019-2022 ConfIDent project]]
+
[[https://projects.tib.eu/en/confident/ © 2019-2022 ConfIDent project and Wolfgang Fahl]]
see also [[http://ptp.bitplan.com/settings Proceedings Title Parser]]
+
see also [[http://cc.bitplan.com Conference Corpus]]
  
 
end title
 
end title
Line 89: Line 89:
 
   class Event << Entity >> {
 
   class Event << Entity >> {
 
   acronym : TEXT  
 
   acronym : TEXT  
 +
  city : TEXT
 +
  country : TEXT
 
   eventId : TEXT  
 
   eventId : TEXT  
 +
  lookupAcronym : TEXT
 
   ordinal : INTEGER  
 
   ordinal : INTEGER  
 +
  region : TEXT
 
   source : TEXT  
 
   source : TEXT  
 
   title : TEXT  
 
   title : TEXT  
 
   year : INTEGER  
 
   year : INTEGER  
 
   }
 
   }
   class event_ceurws << Entity >> {
+
   class event_confref << Entity >> {
   city : TEXT  
+
   area : TEXT  
   country : TEXT  
+
   cityWikidataid : TEXT  
   daterange : TEXT  
+
   countryIso : TEXT  
   debug : BOOLEAN
+
   countryWikidataid : TEXT
   delimiter : TEXT  
+
  dblpSeriesId : TEXT
   description : TEXT  
+
  endDate : TEXT
   enum : TEXT  
+
   keywords : TEXT  
   eventType : TEXT  
+
   location : TEXT  
   extract : TEXT  
+
   ranks : TEXT  
   field : TEXT  
+
   regionIso : TEXT  
   frequency : TEXT  
+
   regionWikidataid : TEXT
 +
  seriesId : TEXT
 +
  seriesTitle : TEXT
 +
  startDate : TEXT
 +
  submissionExtended : INTEGER
 +
  url : TEXT
 +
  }
 +
  class event_gnd << Entity >> {
 +
  acronymCount : INTEGER
 +
  acronyms : TEXT
 +
  cityWikidataid : TEXT
 +
  countryIso : TEXT
 +
  countryWikidataid : TEXT
 +
  date : TEXT
 +
  dateCount : INTEGER
 +
  endDate : DATE
 +
  event : TEXT  
 +
   fulltitle : TEXT  
 +
   homepage : TEXT  
 
   location : TEXT  
 
   location : TEXT  
  lookupAcronym : TEXT
 
  month : TEXT
 
 
   organization : TEXT  
 
   organization : TEXT  
   prefix : TEXT  
+
   place : TEXT  
   province : TEXT  
+
   placeCount : INTEGER
   publish : TEXT  
+
  places : TEXT
   scope : TEXT  
+
  regionIso : TEXT
   syntax : TEXT  
+
  regionWikidataid : TEXT
   topic : TEXT  
+
  startDate : DATE
 +
  variant : TEXT
 +
  variantCount : INTEGER
 +
  variants : TEXT
 +
  }
 +
  class event_wikicfp << Entity >> {
 +
  Final_Version_Due : TEXT
 +
  Notification_Due : TIMESTAMP
 +
  Submission_Deadline : TIMESTAMP
 +
  cityWikidataid : TEXT  
 +
   countryIso : TEXT  
 +
   countryWikidataid : TEXT  
 +
   deleted : INTEGER
 +
  endDate : TIMESTAMP
 +
  eventType : TEXT  
 +
   locality : TEXT  
 +
  regionIso : TEXT
 +
  regionWikidataid : TEXT
 +
  series : TEXT
 +
  seriesId : TEXT
 +
  startDate : TIMESTAMP
 
   url : TEXT  
 
   url : TEXT  
   valid : BOOLEAN
+
   wikiCfpId : INTEGER
   volume : INTEGER  
+
  }
 +
  class event_orclone << Entity >> {
 +
  DblpConferenceId : TEXT
 +
  ISBN : TEXT
 +
  TibKatId : TEXT
 +
  acceptedPapers : INTEGER
 +
  creationDate : TIMESTAMP
 +
  endDate : TIMESTAMP
 +
  eventType : TEXT
 +
  homepage : TEXT
 +
  inEventSeries : TEXT
 +
  lastEditor : TEXT
 +
  modificationDate : TIMESTAMP
 +
  pageTitle : TEXT <<PK>>
 +
  startDate : TIMESTAMP
 +
   submittedPapers : INTEGER  
 +
  url : TEXT
 +
  wikidataId : TEXT
 +
  yearStr : TEXT
 
   }
 
   }
 
   class event_tibkat << Entity >> {
 
   class event_tibkat << Entity >> {
Line 126: Line 184:
 
   bk : TEXT  
 
   bk : TEXT  
 
   changeDate : TEXT  
 
   changeDate : TEXT  
 +
  cityWikidataid : TEXT
 
   corporateCreatorNames : TEXT  
 
   corporateCreatorNames : TEXT  
 
   corporateCreatorTypes : TEXT  
 
   corporateCreatorTypes : TEXT  
 +
  countryIso : TEXT
 +
  countryWikidataid : TEXT
 
   databaseDate : TEXT  
 
   databaseDate : TEXT  
 
   dates : TEXT  
 
   dates : TEXT  
Line 151: Line 212:
 
   pubplace : TEXT  
 
   pubplace : TEXT  
 
   pubyear : TEXT  
 
   pubyear : TEXT  
 +
  regionIso : TEXT
 +
  regionWikidataid : TEXT
 
   sponsorGndId : TEXT  
 
   sponsorGndId : TEXT  
 
   startDate : DATE  
 
   startDate : DATE  
 
   }
 
   }
   class event_gnd << Entity >> {
+
   class event_dblp << Entity >> {
   acronymCount : INTEGER
+
   booktitle : TEXT
   acronyms : TEXT  
+
  cityWikidataid : TEXT
   date : TEXT  
+
  countryIso : TEXT
   dateCount : INTEGER
+
  countryWikidataid : TEXT
 +
  doi : TEXT
 +
  ee : TEXT
 +
  endDate : TIMESTAMP
 +
  isbn : TEXT
 +
  location : TEXT
 +
  mdate : TEXT
 +
  publicationSeries : TEXT
 +
  regionIso : TEXT
 +
  regionWikidataid : TEXT
 +
  series : TEXT
 +
  startDate : TIMESTAMP
 +
  url : TEXT
 +
  }
 +
  class event_crossref << Entity >> {
 +
   cityWikidataid : TEXT  
 +
   countryIso : TEXT  
 +
   countryWikidataid : TEXT
 +
  doi : TEXT
 
   endDate : DATE  
 
   endDate : DATE  
  event : TEXT
 
  fulltitle : TEXT
 
  homepage : TEXT
 
 
   location : TEXT  
 
   location : TEXT  
   organization : TEXT  
+
   month : INTEGER
   place : TEXT  
+
  name : TEXT
   placeCount : INTEGER
+
  number : TEXT  
   places : TEXT  
+
   regionIso : TEXT  
 +
   regionWikidataid : TEXT
 +
   sponsor : TEXT  
 
   startDate : DATE  
 
   startDate : DATE  
   variant : TEXT  
+
   theme : TEXT  
  variantCount : INTEGER
 
  variants : TEXT
 
  }
 
  class event_orclone << Entity >> {
 
  DblpConferenceId : TEXT
 
  ISBN : TEXT
 
  TibKatId : TEXT
 
  acceptedPapers : INTEGER
 
  city : TEXT
 
  country : TEXT
 
  creationDate : TIMESTAMP
 
  endDate : TIMESTAMP
 
  eventType : TEXT
 
  homepage : TEXT
 
  inEventSeries : TEXT
 
  lastEditor : TEXT
 
  modificationDate : TIMESTAMP
 
  pageTitle : TEXT <<PK>>
 
  region : TEXT
 
  startDate : TIMESTAMP
 
  submittedPapers : INTEGER
 
 
   url : TEXT  
 
   url : TEXT  
  wikidataId : TEXT
 
  yearStr : TEXT
 
 
   }
 
   }
 
   class event_wikidata << Entity >> {
 
   class event_wikidata << Entity >> {
   country : TEXT  
+
   cityWikidataid : TEXT  
 
   countryId : TEXT  
 
   countryId : TEXT  
 +
  countryIso : TEXT
 +
  countryWikidataid : TEXT
 
   dblpId : TEXT  
 
   dblpId : TEXT  
 
   describedAtUrl : TEXT  
 
   describedAtUrl : TEXT  
Line 215: Line 274:
 
   proceedings : TEXT  
 
   proceedings : TEXT  
 
   proceedingsLabel : TEXT  
 
   proceedingsLabel : TEXT  
 +
  regionIso : TEXT
 +
  regionWikidataid : TEXT
 
   startDate : TIMESTAMP  
 
   startDate : TIMESTAMP  
 
   url : TEXT  
 
   url : TEXT  
 
   wikiCfpId : TEXT  
 
   wikiCfpId : TEXT  
 
   }
 
   }
  class event_crossref << Entity >> {
+
   Event <|-- event_confref
  doi : TEXT
 
  endDate : DATE
 
  location : TEXT
 
  month : INTEGER
 
  name : TEXT
 
  number : TEXT
 
  sponsor : TEXT
 
  startDate : DATE
 
  theme : TEXT
 
  url : TEXT
 
  }
 
  class event_dblp << Entity >> {
 
  booktitle : TEXT
 
  doi : TEXT
 
  ee : TEXT
 
  endDate : TIMESTAMP
 
  isbn : TEXT
 
  mdate : TEXT
 
  publicationSeries : TEXT
 
  series : TEXT
 
  startDate : TIMESTAMP
 
  url : TEXT
 
  }
 
  class event_wikicfp << Entity >> {
 
  Final_Version_Due : TEXT
 
  Notification_Due : TIMESTAMP
 
  Submission_Deadline : TIMESTAMP
 
  deleted : BOOLEAN
 
  endDate : TIMESTAMP
 
  eventType : TEXT
 
  locality : TEXT
 
  series : TEXT
 
  seriesId : TEXT
 
  startDate : TIMESTAMP
 
  url : TEXT
 
  wikiCfpId : INTEGER
 
  }
 
  class event_orclonebackup << Entity >> {
 
  DblpConferenceId : TEXT
 
  ISBN : TEXT
 
  acceptedPapers : TEXT
 
  city : TEXT
 
  country : TEXT
 
  endDate : TIMESTAMP
 
  eventType : TEXT
 
  homepage : TEXT
 
  inEventSeries : TEXT
 
  pageTitle : TEXT <<PK>>
 
  presence : TEXT
 
  region : TEXT
 
  startDate : TIMESTAMP
 
  submittedPapers : TEXT
 
  url : TEXT
 
  wikiMarkup : TEXT
 
  wikicfpId : TEXT
 
  wikidataId : TEXT
 
  yearStr : TEXT
 
  }
 
  class event_confref << Entity >> {
 
  area : TEXT
 
  city : TEXT
 
  country : TEXT
 
  dblpSeriesId : TEXT
 
  endDate : TEXT
 
  keywords : TEXT
 
  ranks : TEXT
 
  seriesId : TEXT
 
  seriesTitle : TEXT
 
  startDate : TEXT
 
  submissionExtended : BOOLEAN
 
  url : TEXT
 
  }
 
   Event <|-- event_ceurws
 
  Event <|-- event_tibkat
 
 
   Event <|-- event_gnd
 
   Event <|-- event_gnd
 +
  Event <|-- event_wikicfp
 
   Event <|-- event_orclone
 
   Event <|-- event_orclone
 +
  Event <|-- event_tibkat
 +
  Event <|-- event_dblp
 +
  Event <|-- event_crossref
 
   Event <|-- event_wikidata
 
   Event <|-- event_wikidata
  Event <|-- event_crossref
 
  Event <|-- event_dblp
 
  Event <|-- event_wikicfp
 
  Event <|-- event_orclonebackup
 
  Event <|-- event_confref
 
 
}
 
}
 
 
' BITPlan Corporate identity skin params
 
' BITPlan Corporate identity skin params
 
' Copyright (c) 2015-2020 BITPlan GmbH
 
' Copyright (c) 2015-2020 BITPlan GmbH

Revision as of 07:28, 20 November 2022

OsProject

OsProject
edit
id  ConferenceCorpus
state  active
owner  WolfgangFahl
title  Scientific Event Corpus
url  https://github.com/WolfgangFahl/ConferenceCorpus
version  0.1.0
description  
date  2022-11-20
since  2021-07-26
until  

Freitext

Installation

via pip

pip install ConferenceCorpus
# alternatively if your pip is not a python3 pip
pip3 install ConferenceCorpus

upgrade

pip install ConferenceCorpus -U
# alternatively if your pip is not a python3 pip
pip3 install ConferenceCorpus -U

Usage

RESTFul API

Examples

Database View with Sqlite

The EventCorpus.db is in Sqlite format.

using sqlite-web

pip install sqlite-web
sqlite_web $HOME/.conferencecorpus/EventCorpus.db

There is convenience script ccsqliteweb available in the scripts directory which will also kill an existing sqlite_web EventCorpus.db process and run the server in background using nohup.

Command Line

aelookup -h
usage: aelookup [-h] [-d] [-e ENDPOINT] [-v] [-u] [-f]
                [--datasources DATASOURCES]

Scientific Event Corpus and Lookup

  Created by Wolfgang Fahl on 2020-06-22.
  Copyright 2020-2021 Wolfgang Fahl. All rights reserved.

  Licensed under the Apache License 2.0
  http://www.apache.org/licenses/LICENSE-2.0

  Distributed on an "AS IS" basis without warranties
  or conditions of any kind, either express or implied.

USAGE

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           show debug info
  -e ENDPOINT, --endpoint ENDPOINT
                        SPARQL endpoint to use for wikidata queries
  -v, --version         show program's version number and exit
  -u, --uml             output plantuml diagram markup
  -f, --force           force Update - may take quite a time
  --datasources DATASOURCES
                        , delimited list of datasource lookup ids

Overview

Datasources

You might want to open the diagrams in a new tab to be able to click the links depicted.

Event

EventSeries

Updating the database

Openresearch

scripts/getbackup

gets a copy of the nightly OpenResearch backups

Issues

  1. Issue 33 - Event series completion
  2. Issue 32 - regression TemplateNotFound: fb4common/base.html
  3. Issue 31 - Provide RDF export of the data
  4. Issue 30 - add ordinal distribution query
  5. Issue 29 - add scholar RESTFul API
  6. Issue 28 - add generic search for scholarly items
  7. Issue 27 - openresearch results missing in multiquery
  8. Issue 26 - add bib file import
  9. Issue 25 - make multiquery result available via webapi with content negotiation
  10. Issue 24 - allow updating the database via webserver
  11. Issue 23 - dictOfLod Lookup result via commandline
  12. Issue 22 - add multi query option
  13. Issue 21 - add Webserver
  14. Issue 20 - Work around upstream Nominatim OSM Pythontools issue
  15. Issue 19 - Update Openresearch Samples
  16. Issue 18 - Update requirements.txt
  17. Issue 17 - include ACM digital library as a source
  18. Issue 16 - Steps towards csv upload
  19. Issue 15 - Filter obviously invalid Series and Event entries
  20. Issue 14 - wikiCFP 500 Internal Server and TimeOut Error Handling
  21. Issue 12 - Relevant FTX fields
  22. Issue 11 - Locality fixes
  23. Issue 10 - OpenResearch export option
  24. Issue 9 - offline access to EventCorpus.db
  25. Issue 8 - migrate confref data from Proceedings Title Parser here
  26. Issue 7 - migrate crossref data from proceedings title parser here
  27. Issue 6 - migrate dblp data source here from ptp and dblpconf
  28. Issue 5 - dblp xml parser skips some proceedings titles
  29. Issue 4 - add commandline interface to CorpusLookup
  30. Issue 3 - add python api doc
  31. Issue 2 - Cache all SQL tables in the same SQLite database in a ".conferencecorpus" directory
  32. Issue 1 - There should be a common set of attributes for Event and EventSeries from different datasources