Difference between revisions of "SiDIF"

From BITPlan Wiki
Jump to navigation Jump to search
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
== What links here ==
 +
{{WhatLinksHere}}
 +
 
== Introduction ==
 
== Introduction ==
 
The {{sidif}} is yet another format for exchanging data between computers.
 
The {{sidif}} is yet another format for exchanging data between computers.
Line 45: Line 48:
 
</graphviz>
 
</graphviz>
  
== SiDIF Implementation ==
+
== SiDIF Implementations ==
see https://github.com/BITPlan/org.sidif.triplestore
+
see  
== Comparison to other formats ==
+
* https://github.com/BITPlan/org.sidif.triplestore for the original Java
 +
* https://github.com/WolfgangFahl/py-sidif for the more recent Python version
 +
 
 +
 
 +
== Comparison to other Knowledge Representation Approaches ==
 
Most other Triple formats are fare more complex.
 
Most other Triple formats are fare more complex.
 +
See e.g. https://github.com/BITPlan/org.sidif.triplestore/tree/master/src/test/resources/sidif.canonical for some example of triple formats with the same statements
 +
being expressed  in SiDIF
 +
 
* https://en.wikipedia.org/wiki/Cyc
 
* https://en.wikipedia.org/wiki/Cyc
 
e.g. explains the Cyc Statement:
 
e.g. explains the Cyc Statement:
Line 58: Line 68:
 
Paris is capital of France
 
Paris is capital of France
 
</pre>
 
</pre>
 +
The following sections compare three approaches to knowledge representation: RDF, Gellish, and SiDIF, with particular focus on how they handle identity and relationships.
 +
 +
* Gellish: an information representation language, knowledge base and ontology
 +
[[CiteRef::van renssenNonege]]
 +
{{#scite:
 +
|reference=van renssenNonege
 +
|type=journal-article
 +
|title=Gellish: an information representation language, knowledge base and ontology
 +
|authors=A. van Renssen
 +
|journal=ESSDERC 2003. Proceedings of the 33rd European Solid-State Device Research - ESSDERC '03 (IEEE Cat. No. 03EX704)
 +
|publisher=IEEE
 +
|pages=215-228
 +
|doi=10.1109/siit.2003.1251209
 +
|year=None
 +
|retrieved-from=https://doi.org/
 +
|retrieved-on=2024-11-23
 +
}}
 +
=== Theoretical Foundations ===
 +
==== RDF Theory ====
 +
RDF (Resource Description Framework) is based on:
 +
* Statements modeled as triples (subject-predicate-object)
 +
* Universal Resource Identifiers (URIs) as primary identification mechanism
 +
* Graph-based data model
 +
* Optional fourth element (graph) in RDF Quads for context
 +
 +
==== Gellish Theory ====
 +
Gellish is structured as:
 +
* Fixed tabular format with predefined columns
 +
* Relationship-type encoding system
 +
* Partially qualified naming scheme
 +
* Language-aware design
 +
 +
==== SiDIF Theory ====
 +
SiDIF uses:
 +
* Natural language-style triple statements
 +
* Explicit separation of identity aspects
 +
* Flexible predicate structure
 +
* Multi-level identification system
 +
 +
=== Example Representations ===
 +
The same knowledge represented in each format:
 +
 +
==== RDF Example ====
 +
<pre>
 +
<http://example.org/sensors/12> rdf:type <http://example.org/onto/TemperatureSensor> .
 +
<http://example.org/sensors/12> rdfs:label "Sensor 12" .
 +
<http://example.org/sensors/12> <http://example.org/onto/location> "plant1.line3" .
 +
</pre>
 +
 +
==== Gellish Example ====
 +
<pre>
 +
1|English|Sensor 12|1|is a|2|TemperatureSensor|491197|specialization|
 +
2|English|Sensor 12|1|has|3|location|123456|plant1.line3|
 +
</pre>
 +
 +
==== SiDIF Example ====
 +
<pre>
 +
"Sensor 12" isA TemperatureSensor
 +
"plant1.line3.sensor12" is FQN of "Sensor 12"
 +
"urn:plant1:sensor:12" is PID of "Sensor 12"
 +
"http://plant1.company.com/sensors/12" is URI of "Sensor 12"
 +
"opc://plant1/l3/s12" is OPC_URI of "Sensor 12"
 +
</pre>
 +
 +
=== Critical Analysis ===
 +
==== RDF Limitations ====
 +
* Forces URI usage for identification
 +
* Mixes identity with web location
 +
* Complex syntax reduces readability
 +
* Difficult to handle non-web identifiers
 +
 +
==== Gellish Limitations ====
 +
* Rigid tabular structure
 +
* Names not fully qualified
 +
* Limited identifier flexibility
 +
* Complex relationship encoding
 +
 +
==== SiDIF Advantages ====
 +
* Separates different aspects of identity (name, FQN, PID, URI)
 +
* Natural language readability
 +
* Flexible identifier system
 +
* Easy addition of new identifier types
 +
 +
=== Conclusion ===
 +
While RDF and Gellish each have their strengths for specific use cases, SiDIF offers a more flexible and comprehensive approach to identity management. RDF's URI-centric approach limits its usefulness in non-web contexts, while Gellish's rigid structure makes it difficult to adapt to new requirements. SiDIF's separation of identity aspects (name, FQN, PID, URI) combined with its natural language syntax provides a more versatile and maintainable solution for knowledge representation.
 +
 +
Key benefits of the SiDIF approach:
 +
* Clear separation of identity aspects
 +
* Support for multiple identification systems
 +
* Easy system evolution and maintenance
 +
* Better human readability
 +
* Formal precision through explicit identity qualification
  
 
== Links ==
 
== Links ==
* [https://github.com/BITPlan/org.sidif.triplestore/issues Issues ]
+
* [https://github.com/BITPlan/org.sidif.triplestore/issues org.sidif.triplestore Issues ]
  
 
=== Syntax ===
 
=== Syntax ===
Line 206: Line 308:
 
<!-- Special token -->
 
<!-- Special token -->
 
<TR>
 
<TR>
 +
[[Category:SiDIF]]
 +
 
<TD>
 
<TD>
 
<PRE>
 
<PRE>
Line 264: Line 368:
 
</BODY>
 
</BODY>
 
</HTML>
 
</HTML>
 +
 +
 +
[[Category:frontend]]
 +
[[Category:SiGNaL]]

Latest revision as of 11:51, 23 November 2024

What links here

Introduction

The Simple Data Interchange Format (SiDIF) is yet another format for exchanging data between computers.

SiDIF isA DataInterchangeFormat

is a valid SiDIF content.

Examples

City Tokyo

City isA Concept
Tokyo isA City
webpage addsTo City
"http://www.tokyo.jp" is webpage of Tokyo

is valid SiDIF.

SiDIF is based on Triples

Each Sidif statement has a three part structure:

  1. subject
  2. predicate
  3. object

that is called a Triple

Royal family

The Royal92 SiDIF was created via a GEDCOM import. Together with the Model SiDIF and the MetaModel SiDIF it is the basis for the content of the Royal Family wiki A good entry point to browse the structure of that Wiki is the Topic table E.g. you could follow the following links:

  1. Person Concept derived from the Person Topic
  2. Help for the Person Topic

SiDIF Structure

SiDIF expressions

A SiDIF expression like

Tokyo isA City

consists of three parts:

  • Tokyo is the subject
  • isA is the predicate
  • City is the object

Such a set of subject / predicate / object is called a Triple

graphical representation

SiDIF Implementations

see


Comparison to other Knowledge Representation Approaches

Most other Triple formats are fare more complex. See e.g. https://github.com/BITPlan/org.sidif.triplestore/tree/master/src/test/resources/sidif.canonical for some example of triple formats with the same statements being expressed in SiDIF

e.g. explains the Cyc Statement:

 (#$capitalCity #$France #$Paris)

with "Paris is the capital of France." which in SiDIF would be:

Paris is capital of France

The following sections compare three approaches to knowledge representation: RDF, Gellish, and SiDIF, with particular focus on how they handle identity and relationships.

  • Gellish: an information representation language, knowledge base and ontology

1

Theoretical Foundations

RDF Theory

RDF (Resource Description Framework) is based on:

  • Statements modeled as triples (subject-predicate-object)
  • Universal Resource Identifiers (URIs) as primary identification mechanism
  • Graph-based data model
  • Optional fourth element (graph) in RDF Quads for context

Gellish Theory

Gellish is structured as:

  • Fixed tabular format with predefined columns
  • Relationship-type encoding system
  • Partially qualified naming scheme
  • Language-aware design

SiDIF Theory

SiDIF uses:

  • Natural language-style triple statements
  • Explicit separation of identity aspects
  • Flexible predicate structure
  • Multi-level identification system

Example Representations

The same knowledge represented in each format:

RDF Example

<http://example.org/sensors/12> rdf:type <http://example.org/onto/TemperatureSensor> .
<http://example.org/sensors/12> rdfs:label "Sensor 12" .
<http://example.org/sensors/12> <http://example.org/onto/location> "plant1.line3" .

Gellish Example

1|English|Sensor 12|1|is a|2|TemperatureSensor|491197|specialization|
2|English|Sensor 12|1|has|3|location|123456|plant1.line3|

SiDIF Example

"Sensor 12" isA TemperatureSensor
"plant1.line3.sensor12" is FQN of "Sensor 12"
"urn:plant1:sensor:12" is PID of "Sensor 12"
"http://plant1.company.com/sensors/12" is URI of "Sensor 12"
"opc://plant1/l3/s12" is OPC_URI of "Sensor 12"

Critical Analysis

RDF Limitations

  • Forces URI usage for identification
  • Mixes identity with web location
  • Complex syntax reduces readability
  • Difficult to handle non-web identifiers

Gellish Limitations

  • Rigid tabular structure
  • Names not fully qualified
  • Limited identifier flexibility
  • Complex relationship encoding

SiDIF Advantages

  • Separates different aspects of identity (name, FQN, PID, URI)
  • Natural language readability
  • Flexible identifier system
  • Easy addition of new identifier types

Conclusion

While RDF and Gellish each have their strengths for specific use cases, SiDIF offers a more flexible and comprehensive approach to identity management. RDF's URI-centric approach limits its usefulness in non-web contexts, while Gellish's rigid structure makes it difficult to adapt to new requirements. SiDIF's separation of identity aspects (name, FQN, PID, URI) combined with its natural language syntax provides a more versatile and maintainable solution for knowledge representation.

Key benefits of the SiDIF approach:

  • Clear separation of identity aspects
  • Support for multiple identification systems
  • Easy system evolution and maintenance
  • Better human readability
  • Formal precision through explicit identity qualification

Links

Syntax

BNF for SiDIF.jjt

BNF for SiDIF.jjt

TOKENS

/* WHITESPACE AND COMMENTS */
<DEFAULT> SKIP : {
" "
| "\n"
| "\r"
| "\r\n"
| <"#" (~["\n","\r"])* ("\n" | "\r" | "\r\n")>
}
/* TOKENS for Productions */
<DEFAULT> TOKEN : {
<IS: "is">
| <OF: "of">
| <HAS: "has">
}
/* Literals */
<DEFAULT> TOKEN : {
<INTEGER_LITERAL: <DECIMAL_LITERAL> (["l","L"])? | <HEX_LITERAL> (["l","L"])? | <OCTAL_LITERAL> (["l","L"])?>
| <#DECIMAL_LITERAL: ["1"-"9"] (["0"-"9"])*>
| <#HEX_LITERAL: "0" ["x","X"] (["0"-"9","a"-"f","A"-"F"])+>
| <#OCTAL_LITERAL: "0" (["0"-"7"])*>
| <FLOATING_POINT_LITERAL: (["0"-"9"])+ "." (["0"-"9"])* (<EXPONENT>)? (["f","F","d","D"])? | "." (["0"-"9"])+ (<EXPONENT>)? (["f","F","d","D"])? | (["0"-"9"])+ <EXPONENT> (["f","F","d","D"])? | (["0"-"9"])+ (<EXPONENT>)? ["f","F","d","D"]>
| <#EXPONENT: ["e","E"] (["+","-"])? (["0"-"9"])+>
| <CHARACTER_LITERAL: "\'" (~["\'","\\","\n","\r"] | "\\" (["n","t","b","r","f","\\","\'","\""] | ["0"-"7"] (["0"-"7"])? | ["0"-"3"] ["0"-"7"] ["0"-"7"])) "\'">
| <STRING_LITERAL: "\"" (~["\"","\\"] | "\\" (["n","t","b","r","f","\\","\'","\""] | ["0"-"7"] (["0"-"7"])? | ["0"-"3"] ["0"-"7"] ["0"-"7"]))* "\"">
| <DATETIME_LITERAL: <DATE_LITERAL> ((<WHITESPACE>)+ <TIME_LITERAL>)?>
| <#DATE_LITERAL: ["0"-"9"] ["0"-"9"] ["0"-"9"] ["0"-"9"] "-" ["0"-"9"] ["0"-"9"] "-" ["0"-"9"] ["0"-"9"]>
| <TIME_LITERAL: ["0"-"9"] ["0"-"9"] ":" ["0"-"9"] ["0"-"9"] (":" ["0"-"9"] ["0"-"9"])?>
| <TRUE: "true">
| <FALSE: "false">
| <NULL: "null">
}
<DEFAULT> TOKEN : {
<#WHITESPACE: " " | "\t" | "\n" | "\r" | "\f">
}
<DEFAULT> TOKEN : {
<URI: <SCHEME> (~[" ","\t","\n","\r"])+>
| <#SCHEME: "aaa:" | "aaas:" | "about:" | "acap:" | "acct:" | "cap:" | "cid:" | "coap:" | "coaps:" | "crid:" | "data:" | "dav:" | "dict:" | "dns:" | "file:" | "ftp:" | "geo:" | "go:" | "gopher:" | "h323:" | "http:" | "https:" | "iax:" | "icap:" | "im:" | "imap:" | "info:" | "ipp:" | "ipps:" | "iris:" | "iris.beep:" | "iris.xpc:" | "iris.xpcs:" | "iris.lwz:" | "jabber:" | "ldap:" | "mailto:" | "mid:" | "msrp:" | "msrps:" | "mtqp:" | "mupdate:" | "news:" | "nfs:" | "ni:" | "nih:" | "nntp:" | "opaquelocktoken:" | "pkcs11:" | "pop:" | "pres:" | "reload:" | "rtsp:" | "rtsps:" | "rtspu:" | "service:" | "session:" | "shttp:" | "sieve:" | "sip:" | "sips:" | "sms:" | "snmp:" | "soap.beep:" | "soap.beeps:" | "stun:" | "stuns:" | "tag:" | "tel:" | "telnet:" | "tftp:" | "thismessage:" | "tn3270:" | "tip:" | "turn:" | "turns:" | "tv:" | "urn:" | "vemmi:" | "ws:" | "wss:" | "xcon:" | "xcon-userid:" | "xmlrpc.beep:" | "xmlrpc.beeps:" | "xmpp:" | "z39.50r:" | "z39.50s:">
}
/* IDENTIFIER */
<DEFAULT> TOKEN : {
<IDENTIFIER: <LETTER> (<LETTER> | "_" | <DIGIT>)*>
| <#LETTER: ["$","A"-"Z","a"-"z","\u00c0"-"\u00d6","\u00d8"-"\u00f6","\u00f8"-"\u00ff","\u0100"-"\u1fff","\u3040"-"\u318f","\u3300"-"\u337f","\u3400"-"\u3d2d","\u4e00"-"\u9fff","\uf900"-"\ufaff"]>
| <#DIGIT: ["0"-"9","\u0660"-"\u0669","\u06f0"-"\u06f9","\u0966"-"\u096f","\u09e6"-"\u09ef","\u0a66"-"\u0a6f","\u0ae6"-"\u0aef","\u0b66"-"\u0b6f","\u0be7"-"\u0bef","\u0c66"-"\u0c6f","\u0ce6"-"\u0cef","\u0d66"-"\u0d6f","\u0e50"-"\u0e59","\u0ed0"-"\u0ed9","\u1040"-"\u1049"]>
}
// Catch-all tokens. Must be last.
// Any non-whitespace. Causes a parser exception, rather than a
// token manager error (with hidden line numbers).
<DEFAULT> TOKEN : {
<#UNKNOWN: (~[" ","\t","\n","\r","\f"])+>
}

NON-TERMINALS

[[Category:SiDIF]]
/*******************************************
* THE SiDIF LANGUAGE GRAMMAR STARTS HERE *
*******************************************/
/* just as list of links */
Links ::= ( Link | Value )+ <EOF>
/**
* a single link assignment
*/
Link ::= ( ( <IDENTIFIER> <IDENTIFIER> <IDENTIFIER> ) | ( <IDENTIFIER> <IS> <IDENTIFIER> <OF> <IDENTIFIER> ) | ( <IDENTIFIER> <HAS> <IDENTIFIER> <IDENTIFIER> ) )
/**
* Literal Value assignment
*/
Value ::= ( Literal <IS> <IDENTIFIER> <OF> <IDENTIFIER> )
/**
* Handle Literal values
*/
Literal ::= ( <INTEGER_LITERAL> | <FLOATING_POINT_LITERAL> | <CHARACTER_LITERAL> | <STRING_LITERAL> | <DATETIME_LITERAL> | <TIME_LITERAL> | <URI> | <TRUE> | <FALSE> | <NULL> )

References

  1. ^  A. van Renssen. (None) "Gellish: an information representation language, knowledge base and ontology" - 215-228 pages. doi: 10.1109/siit.2003.1251209