Difference between revisions of "PySemanticSlides"

From BITPlan Wiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
|title=PySemanticSlides
 
|title=PySemanticSlides
 
|url=https://github.com/WolfgangFahl/pySemanticSlides
 
|url=https://github.com/WolfgangFahl/pySemanticSlides
|version=0.0.10
+
|version=0.1.0
|date=2023-02-22
+
|date=2024-08-09
 
|since=2023-02-14
 
|since=2023-02-14
 
}}
 
}}
Line 16: Line 16:
 
Luke Slidewalker is here!
 
Luke Slidewalker is here!
  
[https://en.wikipedia.org/wiki/Microsoft_PowerPoint Powerpoint] and its derivatives have been introduced in 1987 and been very popular since. Unfortunately powerpoint is notoriously lacking important features which are standard in other publishing software. The main pain point we found are:
+
[https://en.wikipedia.org/wiki/Microsoft_PowerPoint Powerpoint] and its derivatives have been introduced in 1987 and been very popular since. Unfortunately powerpoint is notoriously lacking important features which are standard in other publishing software. The main pain points we found are:
 
* page numbering
 
* page numbering
 
* internationalization
 
* internationalization
Line 26: Line 26:
  
 
== Powerpoint page numbering ==
 
== Powerpoint page numbering ==
Powerpoint is not very good at page numbering. When exporting power point slides to PDF it has to be taken into account whether the export has been done including hidden slides since the page numbering in the PDF will be different if slides have been hidden in the middle of a presentation.
+
Powerpoint is [https://wiki.bitplan.com/index.php/Microsoft_does_not_listen#Powerpoint_number_of_slides not very good at page numbering]. When exporting power point slides to PDF it has to be taken into account whether the export has been done including hidden slides since the page numbering in the PDF will be different if slides have been hidden in the middle of a presentation.
  
 
== Internationalization ==
 
== Internationalization ==
Line 74: Line 74:
 
= Usage =
 
= Usage =
 
== Command line ==
 
== Command line ==
 +
=== Slidewalker ===
 
<source lang='bash'>
 
<source lang='bash'>
slidewalker -h
+
slidewalker -h  
usage: slidewalker [-h] [-d] [-f FORMAT] [--includeHidden] [--rootPath ROOTPATH]
+
usage: slidewalker [-h] [-a] [-d] [-f FORMAT] [--includeHidden] [--rd RUNDELIM]
 +
                  [--rootPath ROOTPATH] [-V]
  
 
SlideWalker - get meta information for all powerpoint presentations in a certain folder
 
SlideWalker - get meta information for all powerpoint presentations in a certain folder
Line 82: Line 84:
 
options:
 
options:
 
   -h, --help            show this help message and exit
 
   -h, --help            show this help message and exit
 +
  -a, --about          show about info [default: False]
 
   -d, --debug          show debug info
 
   -d, --debug          show debug info
 
   -f FORMAT, --format FORMAT
 
   -f FORMAT, --format FORMAT
                         output format to create: csv,json or txt
+
                         output format to create: csv,json or txt (default: json)
   --includeHidden      exclude hidden slides
+
   --includeHidden      exclude hidden slides (default: False)
 +
  --rd RUNDELIM, --runDelimiter RUNDELIM
 +
                        text run delimiter (default: ) suggested: _↵•
 
   --rootPath ROOTPATH
 
   --rootPath ROOTPATH
 +
  -V, --version        show program's version number and exit
 
</source>
 
</source>
== Example ==
+
 
 +
== Examples ==
 
<source lang='bash'>
 
<source lang='bash'>
slidewalker --rootPath examples -f json
+
slidewalker --rootPath examples/semanticslides
 
</source>
 
</source>
 
=== Json output ===
 
=== Json output ===
 +
You might want to play with the result using an online jq editor such as https://jqplay.org/ and a filter such as <pre>first[] | .slides[] |  .pdf_page,.title,.notes </pre> see https://jqplay.org/s/bYN-gz3bXWq
 
<source lang='json'>
 
<source lang='json'>
 
{
 
{
Line 108: Line 116:
 
         "text": [
 
         "text": [
 
           "pySemanticSlides",
 
           "pySemanticSlides",
           "Semantify• •your• •Presentations",
+
           "Semantify your Presentations",
 
           ""
 
           ""
 
         ],
 
         ],
Line 119: Line 127:
 
         "name": "",
 
         "name": "",
 
         "text": [
 
         "text": [
           "Why• •semantify• •your• •slides•?",
+
           "Why semantify your slides?",
           "The •valuable• •content• •of• •your• •presentation• •is• •hidden• •if• •it• •is• not •FAIR:•Findable•\t•Accessible•Interoperable•Reusable",
+
           "The valuable content of your presentation is hidden if it is not FAIR:Findable\tAccessibleInteroperableReusable",
 
           ""
 
           ""
 
         ],
 
         ],

Latest revision as of 13:37, 9 August 2024

OsProject

OsProject
id  PySemanticSlides
state  active
owner  WolfgangFahl
title  PySemanticSlides
url  https://github.com/WolfgangFahl/pySemanticSlides
version  0.1.0
description  
date  2024-08-09
since  2023-02-14
until  


Installation

pip install pySemanticSlides
# alternatively if your pip is not a python3 pip
pip3 install pySemanticSlides 
# local install from source directory of pySemanticSlides 
pip install .

upgrade

pip install pySemanticSlides  -U
# alternatively if your pip is not a python3 pip
pip3 install pySemanticSlides -U


Motivation

Luke Slidewalker is here!

Powerpoint and its derivatives have been introduced in 1987 and been very popular since. Unfortunately powerpoint is notoriously lacking important features which are standard in other publishing software. The main pain points we found are:

  • page numbering
  • internationalization
  • indexing
  • automation
  • general semantic annotations

PySemanticSlides tries to mitigate these pain points based on the progress that has been made in the past decades by the community (a lot) and Microsoft (a small bit).

Powerpoint page numbering

Powerpoint is not very good at page numbering. When exporting power point slides to PDF it has to be taken into account whether the export has been done including hidden slides since the page numbering in the PDF will be different if slides have been hidden in the middle of a presentation.

Internationalization

Keeping multiple-language versions of presentations with many slides in synch is often a night mare given that it is hard to uniquely reference slides e.g. by name. The title of a slide will differ in different language and the internal name attribute of Powerpoint is hard to manipulate.

Indexing

Indices for books and other publications are standard and often easys to create with help of the publishing software. Powerpoint has been lacking such a feature for decades which is a real pitty. Getting indices for keywords, publications, persons and the like is one of the goals of pySemanticSlides

Existing Powerpoint indexing solutions

Automation

The automation for Powerpoint has been Vendor and technology specific and more often than not incompatible accross platforms. Awkward approaches for internationalization of scripts have been applied which were counter productive more often than not. In the meantime there are open source libraries available for handling presentations and other presentation editing tools are able to import and export powerpoint compatible slide presentations. pySemanticSlides is written in python and makes use of the python-pptx library.

general semantic annotations

This is the core feature of pySemanticSlides. You may add semantic annotations either within you powerpoint slides using the notes of each slide or as an external resource. The definition of the annotations will be linked to a schema definition that will allow to have arbitrary cross links between slides and your semantic objects.

Open Source access

git clone https://github.com/WolfgangFahl/pySemanticSlides
cd pySemanticSlides
pip install .

Testing

# scripts/tests will also work ...
pip install green
green
..
Captured stdout for tests.test_doi.TestDOI.testFetchMeta
Starting test testFetchMeta, debug=False ...
test testFetchMeta, debug=False took   1.7 s

Captured stdout for tests.test_slidewalker.TestSlideWalker.test_slidewalker
Starting test test_slidewalker, debug=False ...
test test_slidewalker, debug=False took   0.0 s

Ran 2 tests in 2.174s using 16 processes

OK (passes=2)

Usage

Command line

Slidewalker

slidewalker -h   
usage: slidewalker [-h] [-a] [-d] [-f FORMAT] [--includeHidden] [--rd RUNDELIM]
                   [--rootPath ROOTPATH] [-V]

SlideWalker - get meta information for all powerpoint presentations in a certain folder

options:
  -h, --help            show this help message and exit
  -a, --about           show about info [default: False]
  -d, --debug           show debug info
  -f FORMAT, --format FORMAT
                        output format to create: csv,json or txt (default: json)
  --includeHidden       exclude hidden slides (default: False)
  --rd RUNDELIM, --runDelimiter RUNDELIM
                        text run delimiter (default: ) suggested: _↵•
  --rootPath ROOTPATH
  -V, --version         show program's version number and exit

Examples

slidewalker --rootPath examples/semanticslides

Json output

You might want to play with the result using an online jq editor such as https://jqplay.org/ and a filter such as

first[] | .slides[] |  .pdf_page,.title,.notes 

see https://jqplay.org/s/bYN-gz3bXWq

{
  "SemanticSlides.pptx": {
    "title": "pySemanticSlides",
    "author": "Wolfgang Fahl",
    "created": "2023-02-14 06:41:31",
    "path": "examples/semanticslides/SemanticSlides.pptx",
    "slides": [
      {
        "page": 1,
        "pdf_page": 1,
        "title": "pySemanticSlides",
        "name": "",
        "text": [
          "pySemanticSlides",
          "Semantify your Presentations",
          ""
        ],
        "notes": ""
      },
      {
        "page": 2,
        "pdf_page": 2,
        "title": "Why semantify your slides?",
        "name": "",
        "text": [
          "Why semantify your slides?",
          "The valuable content of your presentation is hidden if it is not FAIR:Findable\tAccessibleInteroperableReusable",
          ""
        ],
        "notes": "Name: Why_semantify\nTitle: Why semantify your slides?\nKeywords:  Semantification, FAIR\nLiterature: Furth2018, Fair2016"
      }
    ]
  }
}

Powerpoint Textrange runs

Powerpoints handling of text in slides and notes is based on the concept of textrange runs see powerpoint.textrange.runs. A text run consists of a range of characters that share the same font attributes.

Therefore you need to be careful when applying semantic annotations. The default text run delimiter is an empty string "". So there is a difference when specifing the runDelimiter e.g. with some UTF-8 char such as "_","↵","•".

slidewalker --rootPath examples/semanticslides  | jq -s "first[] | .slides[].text"    
[
  "pySemanticSlides",
  "Semantify your Presentations",
  ""
]
[
  "Why semantify your slides?",
  "The valuable content of your presentation is hidden if it is not FAIR:Findable\tAccessibleInteroperableReusable",
  ""
]
slidewalker --rootPath examples/semanticslides --runDelimiter="•"  | jq -s "first[] | .slides[].text"
[
  "pySemanticSlides",
  "Semantify• •your• •Presentations",
  ""
]
[
  "Why• •semantify• •your• •slides•?",
  "The •valuable• •content• •of• •your• •presentation• •is• •hidden• •if• •it• •is• not •FAIR:•Findable•\t•Accessible•Interoperable•Reusable",
  ""
]