Multimedia Information Systems VO/KU (706.052/706.053)
Markup Languages for the Web
Denis Helic
IICM, TU Graz
Markup Languages: Definition (1)
- Markup: additional information, how to interpret the text in a document
- In all textprocessing applications
- Typically, invisible to the user
- e.g. in LaTeX: \section{title of section}
- In HTML <p>new paragraph, blabla...</p>
Markup Languages: Definition (2)
- In RTF (StarOffice - "Hello World"):
{\rtf1\ansi\deff0{\fonttbl{\f0\froman\fprq2\fcharset0 Times;}}
{\colortbl\red0\green0\blue0;\red255\green255\blue255;\red128
\green128\blue128;}{\stylesheet{\s1\snext1 Standard;}}
{\info{\comment StarWriter}{\vern5690}}\deftab720
[...]
\pard\plain \s1 Hello World
\par }
Generalized/Descriptive Markup
- Describe structure of documents only
- Presentation and/or behaviour of documents defined elsewhere
- Separate content (structure) from presentation and/or behaviour
- This principle has a number of advantages
Descriptive Markup: Example
<section id="sect1">
<p>this ia a paragraph</p>
<p>this ia another paragraph</p>
<p>this ia a third paragraph</p>
</section>
Descriptive Markup: Structure
- Structured as nested elements
- Element: markup (tags), attributes, content
- Tags: start tag + end tag
- Attributes: inside start tag
- Content: Text + sub-elements
Descriptive Markup: Advantages (1)
- Efficiency of code
- By separating content, presentation and behaviour files become smaller
- No overhead in markup files
- Presentation and behaviour files stored separately and only once for all markup files
- Saves bandwidth on the Web
Descriptive Markup: Advantages (2)
- Ease of maintenance and consistency in presentation and behaviour
- If presenatation and behaviour is stored only once you can change it at a single place for the whole document collections
- Presentation and behaviour remain in this way also consistent over even large document collections
Descriptive Markup: Advantages (3)
- Device compatibility
- Since markup documents contqain only content it is easy to apply alternative presentations and behaviours
- For displaying the documents on another device - e.g. a handheld device
- Also, accesibility - presenting documents using an alernative output format - e.g. a screen reader format
Descriptive Markup: Advantages (4)
- Web crawlers/robots
- Documents contain only markup and content
- No additional and irrelevant information that is hard for robots to crawl and parse
SGML - Standardized General Markup Language (1)
- SGML: quite old iso standard ISO8879 (1986)
- Meta - markup language
- Allows other markup languages to be defined with it
- The standard doesn't define which markup you have but only the means how you can define it
SGML - Standardized General Markup Language (2)
- Defines the basic syntax (how markup is distinguished from normal text
- I.e., you have elements, attributes and entities
- The interpration/semantics is defined by the processing application
SGML - Standardized General Markup Language (3)
- Elements (that are nested) describe the structure of a document
- Element is delimited with a start-tag and a corresponding end-tag
- End-tags not always :-(
- I.e., end-tags are not obligatory
SGML - Standardized General Markup Language (4)
<p>
this is a paragraph, that holds a quote
<q>
this is the quote
</q>
continue paragraph...
</p>
SGML - Standardized General Markup Language (5)
<note type="warning">
in case of emergency...
</note>
SGML - Standardized General Markup Language (6)
- Entities: SGML based on ASCII
- Binary data coded as entities
- Special characters coded as entities
<!ENTITY figure1 SYSTEM "fig1.bmp" NDATA BMP>
[...]
<figure entity="figure1">
SGML - Standardized General Markup Language (7)
- Since SGML doesn't define which markup we have DTDs are used
- Document type definition (DTD)
- A special file which defines for a particular application which markup we have
- The application itself understands the markup and knows what to do with it
SGML - Standardized General Markup Language (8)
- DTD defines which elements might be used and which elements are required
- Also, how elements are nested, i.e. what are relations between different elements
- Further, DTD defined which attributes an element might have and which attributes an element must have
- Finally, DTD defined the entities
SGML - Standardized General Markup Language (9)
- DTD technology leads to a number of different document types
- Syntactical check against the basic SGML syntax
- Syntactical check against the specific DTD syntax
- Valid document (DTD) vs. well-formed document (markup-syntax only)
SGML - Standardized General Markup Language (10)
- Problems of SGML
- Very complex
- Specification more than 500 pages + specification of DTD
- Omission of end-tags: difficult for parsers
HyperText Markup Language HTML (1)
- The most well-known SGML DTD
- 1990 Tim Berners-Lee used an SGML DTD
- Added images
- Added links
- Created HTML DTD
HyperText Markup Language HTML (2)
- The original idea was to have a very simple DTD
- To ensure separation of content, presentation and behaviour
- A special application was created to understand the HTML DTD
- The first browser!
HyperText Markup Language HTML (3)
- Problems creating a standard
- "Browser Wars" (Netscape vs. Microsoft)
- Adding presentation specific tags (font, center, ...), colors, ...
- Adding different scripting languages for behaviour
HyperText Markup Language HTML (4)
- Mixture of presentation and content
- Mixture of behaviour and content
- Moved away from descriptive/generalized markup
- Web Consortium W3C tries to make standards
HyperText Markup Language HTML (5)
Web Standards Model
- Web Consortium tries to imrpve the situation
- Defines the standard model
- (X)HTML: markup
- Cascading Style Sheets (CSS): presentation/styles
- Javascript: behaviour
(X)HTML: What is it?
- Before we move on onto Web Standards Model in details let us introduce XHTML
- XHTML is redefinition of HTML in XML
- What is XML?
- eXtensible Markup Language (XML) is a light version of SGML
- I.e. XML is SGML-- not HTML++
XML (1)
- Because of complexity of SGML Web Consortium created a new meta-markup language
- Specification of XML about 50 pages
- Easier for users to use it
- Syntax of XML much cleaner than that of SGML: end-tags needed
- Easier to write parsers
XML (2)
- End-tags always needed
- If a markup does not have content: empty-tag (e.g. <empty/>
- Quotation marks needed with attributes
- Names are case-sensitive
- Nesting of elements must be correct
XML (3)
- Since XML strictly separates the content, presentation and behaviour you need additional technologies for that
- I.e. XML is a meta-markup language and does not define semantics of tags
- Quotation marks needed with attributes
- Names are case-sensitive
- Nesting of elements must be correct
(X)HTML (2)
- XHTML 1.0 -> Modularization of XHTML http://www.w3.org/TR/xhtml-modularization/
- Decomposition of XHTML 1.0 into a number of modules
- Module is a set of elements used for the same purpose
- Structure module: body, head, html, title
- Text module: headings, div, span, etc,
(X)HTML (3)
- Hypertext module: a
- Table module: table, tr, td, etc.
- Each module defined with an XML DTD
- Purpose: create an XHTML dialect by combining only modules that are needed
CSS (1)
- Better control for formatting and layout of HTML elements
- Browsers implement default presentation
- With CSS you can alter the default presentation
- You can influence formats, fonts, colors, ...
CSS (2)
- Specifications (approved by Web Consortium)
- CSS1 - supported by all browsers
- CSS2 - almost completely supported
- CSS3 - W3C works on it right now
CSS (3)
- Example of style statements
<style type="text/css">
p.normal { font-size:10pt; color:black; }
p.gross { font-size:12pt; color:black; }
p.klein { font-size:8pt; color:black; }
all.rot { color:red; }
.blau { color:blue; }
/* do not show menu on print: */
@media print {
.menu {display: none;}
}
</style>
CSS (4)
- CSS Statement
- Selector (which element)
- Declaration (which properties to apply for that element)
CSS (5)
- Embedding style into HTML pages
- Special HTML element <style>
- Example
- As an attribute of an HTML element
- Example
CSS (7)
- The best alternative with an external document
- Because only so you have the advantages of separating content and presentation
- Otherwise content and presentation are still mixed
- One change in style requires then typically changes in a number of HTML documents
Javascript (1)
- Embedded in HTML code
- You can embed it using external Javascript files
- Better because you separate content and behaviour
- Works with Document Object Model (DOM)
- DOM is a tree-like representation of the nested structure of HTML elements
Javascript (2)
- In the beginning problems with borwser compatibility
- Now, it is much better
- Also, because of different Javascript libraries
- Libraries hide browser incompabilities
- DOM is a tree-like representation of the nested structure of HTML elements
Reality of Web standards (1)
Reality of Web standards (2)
- Survey made by Jonathan Lane
