PySemanticSlides

From BITPlan Wiki
Jump to navigation Jump to search

OsProject

OsProject
id  PySemanticSlides
state  active
owner  WolfgangFahl
title  PySemanticSlides
url  https://github.com/WolfgangFahl/pySemanticSlides
version  0.0.10
description  
date  2023-02-22
since  2023-02-14
until  


Installation

pip install pySemanticSlides
# alternatively if your pip is not a python3 pip
pip3 install pySemanticSlides 
# local install from source directory of pySemanticSlides 
pip install .

upgrade

pip install pySemanticSlides  -U
# alternatively if your pip is not a python3 pip
pip3 install pySemanticSlides -U


Motivation

Luke Slidewalker is here!

Powerpoint and its derivatives have been introduced in 1987 and been very popular since. Unfortunately powerpoint is notoriously lacking important features which are standard in other publishing software. The main pain point we found are:

  • page numbering
  • internationalization
  • indexing
  • automation
  • general semantic annotations

PySemanticSlides tries to mitigate these pain points based on the progress that has been made in the past decades by the community (a lot) and Microsoft (a small bit).

Powerpoint page numbering

Powerpoint is not very good at page numbering. When exporting power point slides to PDF it has to be taken into account whether the export has been done including hidden slides since the page numbering in the PDF will be different if slides have been hidden in the middle of a presentation.

Internationalization

Keeping multiple-language versions of presentations with many slides in synch is often a night mare given that it is hard to uniquely reference slides e.g. by name. The title of a slide will differ in different language and the internal name attribute of Powerpoint is hard to manipulate.

Indexing

Indices for books and other publications are standard and often easys to create with help of the publishing software. Powerpoint has been lacking such a feature for decades which is a real pitty. Getting indices for keywords, publications, persons and the like is one of the goals of pySemanticSlides

Existing Powerpoint indexing solutions

Automation

The automation for Powerpoint has been Vendor and technology specific and more often than not incompatible accross platforms. Awkward approaches for internationalization of scripts have been applied which were counter productive more often than not. In the meantime there are open source libraries available for handling presentations and other presentation editing tools are able to import and export powerpoint compatible slide presentations. pySemanticSlides is written in python and makes use of the python-pptx library.

general semantic annotations

This is the core feature of pySemanticSlides. You may add semantic annotations either within you powerpoint slides using the notes of each slide or as an external resource. The definition of the annotations will be linked to a schema definition that will allow to have arbitrary cross links between slides and your semantic objects.

Open Source access

git clone https://github.com/WolfgangFahl/pySemanticSlides
cd pySemanticSlides
pip install .

Testing

# scripts/tests will also work ...
pip install green
green
..
Captured stdout for tests.test_doi.TestDOI.testFetchMeta
Starting test testFetchMeta, debug=False ...
test testFetchMeta, debug=False took   1.7 s

Captured stdout for tests.test_slidewalker.TestSlideWalker.test_slidewalker
Starting test test_slidewalker, debug=False ...
test test_slidewalker, debug=False took   0.0 s

Ran 2 tests in 2.174s using 16 processes

OK (passes=2)

Usage

Command line

slidewalker -h
usage: slidewalker [-h] [-d] [-f FORMAT] [--includeHidden] [--rootPath ROOTPATH]

SlideWalker - get meta information for all powerpoint presentations in a certain folder

options:
  -h, --help            show this help message and exit
  -d, --debug           show debug info
  -f FORMAT, --format FORMAT
                        output format to create: csv,json or txt
  --includeHidden       exclude hidden slides
  --rootPath ROOTPATH

Example

slidewalker --rootPath examples -f json

Json output

{
  "SemanticSlides.pptx": {
    "title": "pySemanticSlides",
    "author": "Wolfgang Fahl",
    "created": "2023-02-14 06:41:31",
    "path": "examples/semanticslides/SemanticSlides.pptx",
    "slides": [
      {
        "page": 1,
        "pdf_page": 1,
        "title": "pySemanticSlides",
        "name": "",
        "text": [
          "pySemanticSlides",
          "Semantify• •your• •Presentations",
          ""
        ],
        "notes": ""
      },
      {
        "page": 2,
        "pdf_page": 2,
        "title": "Why semantify your slides?",
        "name": "",
        "text": [
          "Why• •semantify• •your• •slides•?",
          "The •valuable• •content• •of• •your• •presentation• •is• •hidden• •if• •it• •is• not •FAIR:•Findable•\t•Accessible•Interoperable•Reusable",
          ""
        ],
        "notes": "Name: Why_semantify\nTitle: Why semantify your slides?\nKeywords:  Semantification, FAIR\nLiterature: Furth2018, Fair2016"
      }
    ]
  }
}

Powerpoint Textrange runs

Powerpoints handling of text in slides and notes is based on the concept of textrange runs see powerpoint.textrange.runs. A text run consists of a range of characters that share the same font attributes.

Therefore you need to be careful when applying semantic annotations. The default text run delimiter is an empty string "". So there is a difference when specifing the runDelimiter e.g. with some UTF-8 char such as "_","↵","•".

slidewalker --rootPath examples/semanticslides  | jq -s "first[] | .slides[].text"    
[
  "pySemanticSlides",
  "Semantify your Presentations",
  ""
]
[
  "Why semantify your slides?",
  "The valuable content of your presentation is hidden if it is not FAIR:Findable\tAccessibleInteroperableReusable",
  ""
]
slidewalker --rootPath examples/semanticslides --runDelimiter="_"  | jq -s "first[] | .slides[].text" 
[
  "pySemanticSlides",
  "Semantify_ _your_ _Presentations",
  ""
]
[
  "Why_ _semantify_ _your_ _slides_?",
  "The _valuable_ _content_ _of_ _your_ _presentation_ _is_ _hidden_ _if_ _it_ _is_ not _FAIR:_Findable_\t_Accessible_Interoperable_Reusable",
  ""
]