Infrastructure:
Markup

Parke Godfrey
21 September 2011
26 September 2011
CSE-2041

Credits

These slides are based in part on ones from the following sources.

Presentation & Rendering

Format Goals

Semantics versus Presentation & Rendering

Our content format should abstract away from how it is to be rendered.

What should this universal format be?

Markup languages provide this abstraction.

XML (eXtensible Markup Language)

XML is a markup language that is entirely semantic based.

XML
The Spectrum

XML as a database

  • XML is a data model, as is the relational data model.

  • The XML data model is more flexible in many ways than the relational model.

XML as a document format

  • XML provides a simple but elegant way to structure documents via markup.

  • XML has become a common standard.

Model-View-Controller ( MVC) Paradigm

The Web Ecosystem
Format

The Web follows the MVC paradigm.

We study each of these in Section III: Client-side.

We study the basics of markup and HTML here.

Hyptertext Markup Language (HTML)

“Derived” from XML. Instead of free tags, there is a defined list of tags.

HTML
Origins

Originally, derived from Standard Generalized Markup Language (SGML).

Why? This provided existing tools such as parsers.

XML developed in parallel with HTML (and, originally, derives from SGML too).

HTML standards later changed (HTML4, XHTML, HTML5) to define HTML as derived from XML instead.

Why? Tremendous support exists for XML. These tools apply directly to HTML too.

HTML
Whitespace & References

HTML
Structural Elements

HTML
Common Elements

HTML
Figures & Media

Screen units: px%em, & pt

HTML
The Anchor Element

<a> also anchors the other side of a link!

HTML
Lists

HTML
Tables

HTML
Well‐Formed & Valid

Well‐Formed
declaration (preamble)
one root: <html>
paired open and close tags
no straddling of tag pairs
 
Valid
valid element names (tags)
valid element nesting
valid attribute names
valid attribute values

Scruffy versus Neat
(Loose versus Strict)

Should the format be loosely or strictly enforced?

loose

+

Easier to author pages.

Renderer makes best effort. (Graceful degradation.)

Automated tools have a harder time to understand and manipulate content.

Renderer can mess up badly. (Document is harder to parse. Renderer may refuse non-well-formed or invalid documents.)

E.g., HTML4, HTML5

Scruffy versus Neat
(Loose versus Strict)

strict

Harder to author pages.

Harder to maintain valid documents.

+

Automated tools can understand and manipulate content.

Renderer knows how to handle the page.

E.g., XHTML