ConferenceCorpus: Difference between revisions

From BITPlan Wiki
Jump to navigation Jump to search
Line 81: Line 81:
title
title
ConfIDent  Event
ConfIDent  Event
2021-08-21
2022-04-06
[[https://projects.tib.eu/en/confident/ © 2019-2021 ConfIDent project]]
[[https://projects.tib.eu/en/confident/ © 2019-2022 ConfIDent project]]
see also [[http://ptp.bitplan.com/settings Proceedings Title Parser]]
see also [[http://ptp.bitplan.com/settings Proceedings Title Parser]]


Line 89: Line 89:
   class Event << Entity >> {
   class Event << Entity >> {
   acronym : TEXT  
   acronym : TEXT  
  city : TEXT
  country : TEXT
   eventId : TEXT  
   eventId : TEXT  
  ordinal : INTEGER
   source : TEXT  
   source : TEXT  
   title : TEXT  
   title : TEXT  
  year : INTEGER
  }
  class event_ceurws << Entity >> {
  city : TEXT
  country : TEXT
  daterange : TEXT
  debug : BOOLEAN
  delimiter : TEXT
  description : TEXT
  enum : TEXT
  eventType : TEXT
  extract : TEXT
  field : TEXT
  frequency : TEXT
  location : TEXT
  lookupAcronym : TEXT
  month : TEXT
  organization : TEXT
  prefix : TEXT
  province : TEXT
  publish : TEXT
  scope : TEXT
  syntax : TEXT
  topic : TEXT
   url : TEXT  
   url : TEXT  
   year : INTEGER  
   valid : BOOLEAN
  volume : INTEGER  
  }
  class event_tibkat << Entity >> {
  alternativeTitles : TEXT
  authorGndId : TEXT
  bk : TEXT
  changeDate : TEXT
  corporateCreatorNames : TEXT
  corporateCreatorTypes : TEXT
  databaseDate : TEXT
  dates : TEXT
  ddc : TEXT
  description : TEXT
  documentGenreCode : TEXT
  documentId : TEXT
  documentTypeCode : TEXT
  doi : TEXT
  ean : TEXT
  endDate : DATE
  event : TEXT
  firstid : TEXT
  ftxCreationDate : TEXT
  gndIds : TEXT
  isbn : TEXT
  isbn13 : TEXT
  journalTitle : TEXT
  journalVolumeNumber : TEXT
  location : TEXT
  ppn : TEXT
  publisher : TEXT
  pubplace : TEXT
  pubyear : TEXT
  sponsorGndId : TEXT
  startDate : DATE
  }
  class event_gnd << Entity >> {
  acronymCount : INTEGER
  acronyms : TEXT
  date : TEXT
  dateCount : INTEGER
  endDate : DATE
  event : TEXT
  fulltitle : TEXT
  homepage : TEXT
  location : TEXT
  organization : TEXT
  place : TEXT
  placeCount : INTEGER
  places : TEXT
  startDate : DATE
  variant : TEXT
  variantCount : INTEGER
  variants : TEXT
   }
   }
Note top of event_orclone
[[https://confident.dbis.rwth-aachen.de/or OPENRESEARCH (orclone-api)]]
9470 instances
End note
   class event_orclone << Entity >> {
   class event_orclone << Entity >> {
  DblpConferenceId : TEXT
  ISBN : TEXT
  TibKatId : TEXT
   acceptedPapers : INTEGER  
   acceptedPapers : INTEGER  
  city : TEXT
  country : TEXT
   creationDate : TIMESTAMP  
   creationDate : TIMESTAMP  
   endDate : TIMESTAMP  
   endDate : TIMESTAMP  
Line 110: Line 187:
   lastEditor : TEXT  
   lastEditor : TEXT  
   modificationDate : TIMESTAMP  
   modificationDate : TIMESTAMP  
  ordinal : INTEGER
   pageTitle : TEXT <<PK>>
   pageTitle : TEXT <<PK>>
   region : TEXT  
   region : TEXT  
   startDate : TIMESTAMP  
   startDate : TIMESTAMP  
   submittedPapers : INTEGER  
   submittedPapers : INTEGER  
  url : TEXT
  wikidataId : TEXT
   yearStr : TEXT  
   yearStr : TEXT  
   }
   }
Note top of event_orclonebackup
   class event_wikidata << Entity >> {
[[https://confident.dbis.rwth-aachen.de/or OPENRESEARCH (orclone-backup)]]
   country : TEXT
9325 instances
  countryId : TEXT
End note
  dblpId : TEXT
   class event_orclonebackup << Entity >> {
  describedAtUrl : TEXT
   acceptedPapers : TEXT  
  doi : TEXT  
   endDate : TEXT  
   endDate : TIMESTAMP
   eventType : TEXT  
  eventInSeries : TEXT
  eventInSeriesId : TEXT
  eventTitle : TEXT
  followedById : TEXT  
   gndId : TEXT  
   homepage : TEXT  
   homepage : TEXT  
   inEventSeries : TEXT  
   language : TEXT  
   ordinal : TEXT  
   location : TEXT  
   pageTitle : TEXT <<PK>>
   locationId : TEXT  
   presence : TEXT  
   mainSubject : TEXT  
   region : TEXT  
   ppn : TEXT  
   startDate : TEXT  
   proceedings : TEXT  
   submittedPapers : TEXT  
   proceedingsLabel : TEXT  
   yearStr : TEXT
   startDate : TIMESTAMP
  }
   url : TEXT  
Note top of event_confref
   wikiCfpId : TEXT  
[[http://portal.confref.org ConfRef]]
37945 instances
End note
  class event_confref << Entity >> {
  area : TEXT
  dblpSeriesId : TEXT
  endDate : TEXT
  keywords : TEXT
   ranks : TEXT  
   seriesId : TEXT  
  seriesTitle : TEXT
  startDate : TEXT
  submissionExtended : BOOLEAN
   }
   }
Note top of event_crossref
[[https://www.crossref.org/ CrossRef]]
49280 instances
End note
   class event_crossref << Entity >> {
   class event_crossref << Entity >> {
  cityWikidataid : TEXT
  countryWikidataid : TEXT
   doi : TEXT  
   doi : TEXT  
   endDate : DATE  
   endDate : DATE  
Line 163: Line 226:
   name : TEXT  
   name : TEXT  
   number : TEXT  
   number : TEXT  
  region : TEXT
  regionWikidataid : TEXT
   sponsor : TEXT  
   sponsor : TEXT  
   startDate : DATE  
   startDate : DATE  
   theme : TEXT  
   theme : TEXT  
  url : TEXT
   }
   }
Note top of event_wikidata
   class event_dblp << Entity >> {
[[https://www.wikidata.org/wiki/Wikidata:Main_Page Wikidata]]
   booktitle : TEXT  
7508 instances
   doi : TEXT  
End note
   ee : TEXT  
   class event_wikidata << Entity >> {
   cityId : TEXT  
   countryId : TEXT  
   dblpConferenceId : TEXT  
   endDate : TIMESTAMP  
   endDate : TIMESTAMP  
   eventInSeries : TEXT  
   isbn : TEXT  
   eventInSeriesId : TEXT  
   mdate : TEXT  
   gndId : TEXT  
   publicationSeries : TEXT  
   homepage : TEXT
   series : TEXT  
  language : TEXT
  location : TEXT
  locationId : TEXT
  mainSubject : TEXT
  ordinal : TEXT  
   startDate : TIMESTAMP  
   startDate : TIMESTAMP  
   wikiCfpId : TEXT  
   url : TEXT  
   }
   }
Note top of event_wikicfp
[[http://www.wikicfp.com WikiCFP]]
87987 instances
End note
   class event_wikicfp << Entity >> {
   class event_wikicfp << Entity >> {
   Final_Version_Due : TEXT  
   Final_Version_Due : TEXT  
   Notification_Due : TIMESTAMP  
   Notification_Due : TIMESTAMP  
   Submission_Deadline : TIMESTAMP  
   Submission_Deadline : TIMESTAMP  
  cityWikidataid : TEXT
   deleted : BOOLEAN
  countryWikidataid : TEXT
   deleted : INTEGER
   endDate : TIMESTAMP  
   endDate : TIMESTAMP  
   eventType : TEXT  
   eventType : TEXT  
   locality : TEXT  
   locality : TEXT  
  region : TEXT
  regionWikidataid : TEXT
   series : TEXT  
   series : TEXT  
   seriesId : TEXT  
   seriesId : TEXT  
   startDate : TIMESTAMP  
   startDate : TIMESTAMP  
  url : TEXT
   wikiCfpId : INTEGER  
   wikiCfpId : INTEGER  
   }
   }
Note top of event_dblp
   class event_orclonebackup << Entity >> {
[[https://dblp.org/ dblp computer science bibliography]]
   DblpConferenceId : TEXT  
47891 instances
   ISBN : TEXT  
End note
   acceptedPapers : TEXT  
   class event_dblp << Entity >> {
   city : TEXT  
   booktitle : TEXT  
   country : TEXT  
   cityWikidataid : TEXT  
   countryWikidataid : TEXT  
   doi : TEXT  
   ee : TEXT  
  isbn : TEXT
  location : TEXT
  mdate : TEXT
  publicationSeries : TEXT
  region : TEXT
  regionWikidataid : TEXT
  series : TEXT
  }
Note top of event_or
[[https://www.openresearch.org/mediawiki/ OPENRESEARCH (or-api)]]
9471 instances
End note
  class event_or << Entity >> {
  acceptedPapers : INTEGER
  creationDate : TIMESTAMP
   endDate : TIMESTAMP  
   endDate : TIMESTAMP  
   eventType : TEXT  
   eventType : TEXT  
   homepage : TEXT  
   homepage : TEXT  
   inEventSeries : TEXT  
   inEventSeries : TEXT  
  lastEditor : TEXT
  modificationDate : TIMESTAMP
  ordinal : INTEGER
   pageTitle : TEXT <<PK>>
   pageTitle : TEXT <<PK>>
  presence : TEXT
   region : TEXT  
   region : TEXT  
   startDate : TIMESTAMP  
   startDate : TIMESTAMP  
   submittedPapers : INTEGER
   submittedPapers : TEXT
  url : TEXT
  wikiMarkup : TEXT
  wikicfpId : TEXT
  wikidataId : TEXT
   yearStr : TEXT  
   yearStr : TEXT  
   }
   }
Note top of event_orbackup
   class event_confref << Entity >> {
[[https://www.openresearch.org/mediawiki/ OPENRESEARCH (or-backup)]]
   area : TEXT
9231 instances
  city : TEXT
End note
  country : TEXT
   class event_orbackup << Entity >> {
  dblpSeriesId : TEXT  
   acceptedPapers : TEXT  
   endDate : TEXT  
   endDate : TEXT  
   eventType : TEXT  
   keywords : TEXT  
   homepage : TEXT  
   ranks : TEXT  
   inEventSeries : TEXT  
   seriesId : TEXT  
   ordinal : TEXT
   seriesTitle : TEXT  
  pageTitle : TEXT <<PK>>
  region : TEXT  
   startDate : TEXT  
   startDate : TEXT  
   submittedPapers : TEXT
   submissionExtended : BOOLEAN
   yearStr : TEXT  
   url : TEXT  
   }
   }
  Event <|-- event_ceurws
  Event <|-- event_tibkat
  Event <|-- event_gnd
   Event <|-- event_orclone
   Event <|-- event_orclone
  Event <|-- event_wikidata
  Event <|-- event_crossref
  Event <|-- event_dblp
  Event <|-- event_wikicfp
   Event <|-- event_orclonebackup
   Event <|-- event_orclonebackup
   Event <|-- event_confref
   Event <|-- event_confref
  Event <|-- event_crossref
  Event <|-- event_wikidata
  Event <|-- event_wikicfp
  Event <|-- event_dblp
  Event <|-- event_or
  Event <|-- event_orbackup
}
}



Revision as of 06:05, 6 April 2022

OsProject

OsProject
edit
id  ConferenceCorpus
state  active
owner  WolfgangFahl
title  Scientific Event Corpus
url  https://github.com/WolfgangFahl/ConferenceCorpus
version  0.0.10
description  
date  2021-08-03
since  2021-07-26
until  

Freitext

Installation

via pip

pip install ConferenceCorpus
# alternatively if your pip is not a python3 pip
pip3 install ConferenceCorpus

upgrade

pip install ConferenceCorpus -U
# alternatively if your pip is not a python3 pip
pip3 install ConferenceCorpus -U

Usage

RESTFul API

Examples

Database View with Sqlite

The EventCorpus.db is in Sqlite format.

using sqlite-web

pip install sqlite-web
sqlite_web $HOME/.conferencecorpus/EventCorpus.db

There is convenience script ccsqliteweb available in the scripts directory which will also kill an existing sqlite_web EventCorpus.db process and run the server in background using nohup.

Command Line

aelookup -h
usage: aelookup [-h] [-d] [-e ENDPOINT] [-v] [-u] [-f]
                [--datasources DATASOURCES]

Scientific Event Corpus and Lookup

  Created by Wolfgang Fahl on 2020-06-22.
  Copyright 2020-2021 Wolfgang Fahl. All rights reserved.

  Licensed under the Apache License 2.0
  http://www.apache.org/licenses/LICENSE-2.0

  Distributed on an "AS IS" basis without warranties
  or conditions of any kind, either express or implied.

USAGE

optional arguments:
  -h, --help            show this help message and exit
  -d, --debug           show debug info
  -e ENDPOINT, --endpoint ENDPOINT
                        SPARQL endpoint to use for wikidata queries
  -v, --version         show program's version number and exit
  -u, --uml             output plantuml diagram markup
  -f, --force           force Update - may take quite a time
  --datasources DATASOURCES
                        , delimited list of datasource lookup ids

Overview

Datasources

You might want to open the diagrams in a new tab to be able to click the links depicted.

Event

EventSeries

Updating the database

Openresearch

scripts/getbackup

gets a copy of the nightly OpenResearch backups

Issues

  1. Issue 33 - Event series completion
  2. Issue 32 - regression TemplateNotFound: fb4common/base.html
  3. Issue 31 - Provide RDF export of the data
  4. Issue 30 - add ordinal distribution query
  5. Issue 29 - add scholar RESTFul API
  6. Issue 28 - add generic search for scholarly items
  7. Issue 27 - openresearch results missing in multiquery
  8. Issue 26 - add bib file import
  9. Issue 25 - make multiquery result available via webapi with content negotiation
  10. Issue 24 - allow updating the database via webserver
  11. Issue 23 - dictOfLod Lookup result via commandline
  12. Issue 22 - add multi query option
  13. Issue 21 - add Webserver
  14. Issue 20 - Work around upstream Nominatim OSM Pythontools issue
  15. Issue 19 - Update Openresearch Samples
  16. Issue 18 - Update requirements.txt
  17. Issue 17 - include ACM digital library as a source
  18. Issue 16 - Steps towards csv upload
  19. Issue 15 - Filter obviously invalid Series and Event entries
  20. Issue 14 - wikiCFP 500 Internal Server and TimeOut Error Handling
  21. Issue 12 - Relevant FTX fields
  22. Issue 11 - Locality fixes
  23. Issue 10 - OpenResearch export option
  24. Issue 9 - offline access to EventCorpus.db
  25. Issue 8 - migrate confref data from Proceedings Title Parser here
  26. Issue 7 - migrate crossref data from proceedings title parser here
  27. Issue 6 - migrate dblp data source here from ptp and dblpconf
  28. Issue 5 - dblp xml parser skips some proceedings titles
  29. Issue 4 - add commandline interface to CorpusLookup
  30. Issue 3 - add python api doc
  31. Issue 2 - Cache all SQL tables in the same SQLite database in a ".conferencecorpus" directory
  32. Issue 1 - There should be a common set of attributes for Event and EventSeries from different datasources