Difference between revisions of "PySemanticSlides"
Line 16: | Line 16: | ||
Luke Slidewalker is here! | Luke Slidewalker is here! | ||
− | [https://en.wikipedia.org/wiki/Microsoft_PowerPoint Powerpoint] and its derivatives have been introduced in 1987 and been very popular since. Unfortunately powerpoint is notoriously lacking important features which are standard in other publishing software. The main pain | + | [https://en.wikipedia.org/wiki/Microsoft_PowerPoint Powerpoint] and its derivatives have been introduced in 1987 and been very popular since. Unfortunately powerpoint is notoriously lacking important features which are standard in other publishing software. The main pain points we found are: |
* page numbering | * page numbering | ||
* internationalization | * internationalization |
Revision as of 08:13, 25 February 2023
OsProject
OsProject | |
---|---|
id | PySemanticSlides |
state | active |
owner | WolfgangFahl |
title | PySemanticSlides |
url | https://github.com/WolfgangFahl/pySemanticSlides |
version | 0.0.10 |
description | |
date | 2023-02-23 |
since | 2023-02-14 |
until |
Installation
pip install pySemanticSlides
# alternatively if your pip is not a python3 pip
pip3 install pySemanticSlides
# local install from source directory of pySemanticSlides
pip install .
upgrade
pip install pySemanticSlides -U
# alternatively if your pip is not a python3 pip
pip3 install pySemanticSlides -U
Motivation
Luke Slidewalker is here!
Powerpoint and its derivatives have been introduced in 1987 and been very popular since. Unfortunately powerpoint is notoriously lacking important features which are standard in other publishing software. The main pain points we found are:
- page numbering
- internationalization
- indexing
- automation
- general semantic annotations
PySemanticSlides tries to mitigate these pain points based on the progress that has been made in the past decades by the community (a lot) and Microsoft (a small bit).
Powerpoint page numbering
Powerpoint is not very good at page numbering. When exporting power point slides to PDF it has to be taken into account whether the export has been done including hidden slides since the page numbering in the PDF will be different if slides have been hidden in the middle of a presentation.
Internationalization
Keeping multiple-language versions of presentations with many slides in synch is often a night mare given that it is hard to uniquely reference slides e.g. by name. The title of a slide will differ in different language and the internal name attribute of Powerpoint is hard to manipulate.
Indexing
Indices for books and other publications are standard and often easys to create with help of the publishing software. Powerpoint has been lacking such a feature for decades which is a real pitty. Getting indices for keywords, publications, persons and the like is one of the goals of pySemanticSlides
Existing Powerpoint indexing solutions
Automation
The automation for Powerpoint has been Vendor and technology specific and more often than not incompatible accross platforms. Awkward approaches for internationalization of scripts have been applied which were counter productive more often than not. In the meantime there are open source libraries available for handling presentations and other presentation editing tools are able to import and export powerpoint compatible slide presentations. pySemanticSlides is written in python and makes use of the python-pptx library.
general semantic annotations
This is the core feature of pySemanticSlides. You may add semantic annotations either within you powerpoint slides using the notes of each slide or as an external resource. The definition of the annotations will be linked to a schema definition that will allow to have arbitrary cross links between slides and your semantic objects.
Open Source access
git clone https://github.com/WolfgangFahl/pySemanticSlides
cd pySemanticSlides
pip install .
Testing
# scripts/tests will also work ...
pip install green
green
..
Captured stdout for tests.test_doi.TestDOI.testFetchMeta
Starting test testFetchMeta, debug=False ...
test testFetchMeta, debug=False took 1.7 s
Captured stdout for tests.test_slidewalker.TestSlideWalker.test_slidewalker
Starting test test_slidewalker, debug=False ...
test test_slidewalker, debug=False took 0.0 s
Ran 2 tests in 2.174s using 16 processes
OK (passes=2)
Usage
Command line
Slidewalker
slidewalker -h
usage: slidewalker [-h] [-a] [-d] [-f FORMAT] [--includeHidden] [--rd RUNDELIM]
[--rootPath ROOTPATH] [-V]
SlideWalker - get meta information for all powerpoint presentations in a certain folder
options:
-h, --help show this help message and exit
-a, --about show about info [default: False]
-d, --debug show debug info
-f FORMAT, --format FORMAT
output format to create: csv,json or txt (default: json)
--includeHidden exclude hidden slides (default: False)
--rd RUNDELIM, --runDelimiter RUNDELIM
text run delimiter (default: ) suggested: _↵•
--rootPath ROOTPATH
-V, --version show program's version number and exit
Examples
slidewalker --rootPath examples/semanticslides
Json output
You might want to play with the result using an online jq editor such as https://jqplay.org/ and a filter such as
first[] | .slides[] | .pdf_page,.title,.notes
see https://jqplay.org/s/bYN-gz3bXWq
{
"SemanticSlides.pptx": {
"title": "pySemanticSlides",
"author": "Wolfgang Fahl",
"created": "2023-02-14 06:41:31",
"path": "examples/semanticslides/SemanticSlides.pptx",
"slides": [
{
"page": 1,
"pdf_page": 1,
"title": "pySemanticSlides",
"name": "",
"text": [
"pySemanticSlides",
"Semantify your Presentations",
""
],
"notes": ""
},
{
"page": 2,
"pdf_page": 2,
"title": "Why semantify your slides?",
"name": "",
"text": [
"Why semantify your slides?",
"The valuable content of your presentation is hidden if it is not FAIR:Findable\tAccessibleInteroperableReusable",
""
],
"notes": "Name: Why_semantify\nTitle: Why semantify your slides?\nKeywords: Semantification, FAIR\nLiterature: Furth2018, Fair2016"
}
]
}
}
Powerpoint Textrange runs
Powerpoints handling of text in slides and notes is based on the concept of textrange runs see powerpoint.textrange.runs. A text run consists of a range of characters that share the same font attributes.
Therefore you need to be careful when applying semantic annotations. The default text run delimiter is an empty string "". So there is a difference when specifing the runDelimiter e.g. with some UTF-8 char such as "_","↵","•".
slidewalker --rootPath examples/semanticslides | jq -s "first[] | .slides[].text"
[
"pySemanticSlides",
"Semantify your Presentations",
""
]
[
"Why semantify your slides?",
"The valuable content of your presentation is hidden if it is not FAIR:Findable\tAccessibleInteroperableReusable",
""
]
slidewalker --rootPath examples/semanticslides --runDelimiter="•" | jq -s "first[] | .slides[].text"
[
"pySemanticSlides",
"Semantify• •your• •Presentations",
""
]
[
"Why• •semantify• •your• •slides•?",
"The •valuable• •content• •of• •your• •presentation• •is• •hidden• •if• •it• •is• not •FAIR:•Findable•\t•Accessible•Interoperable•Reusable",
""
]