Hypertext Markup Language (HTML)

In 1966… I knew nothing about computers, but I knew there had to be a better way to produce documents than dictating them, reviewing a draft, marking up the draft with corrections, reviewing the retyped draft, and then, in frustration, seeing that the typist had introduced more errors while making the corrections…

– Charles Goldfarb, The Roots of SGML — A Personal Recollection, 1996.

HTML is the simple and powerful language used to describe web pages, and is still used as the main interface language to the web.

In 1969, the same year the ARPANET was created, Charles Goldfarb, Edward Mosher, and Raymond Loriewas invented the Generalized Markup Language (GML) to facilitate text management in large information systems. GML was based on the work of Rice and Tunnicliffe with tagging schemes, and added a formal document structure, so that any computer program could automatically process and format the individual parts of the document.

In 1980, the American National Standards Institute (ANSI) committee built on GML and published a working draft of Standard GML, or SGML. Major adopters of the standard included the US Internal Revenue Service and Department of Defense. A draft international standard was adopted by the European Community in 1985, and a final standard was published as ISO 8879:1986. The final standard was published with a working SGML system developed by Anders Berglund, then of the European Particle Physics Laboratory (CERN).

A few years later another scientist at CERN, Tim Berners-Lee, invented the HyperText Markup Language (HTML) to define the structure of web pages. Tim never planned HTML to be more than a structure into which a wide range of multi-media documents would be fitted, but it was designed well enough that it came to be used to present a wide range of content itself. The main structure of modern HTML was agreed at a meeting at the first WWW Conference held the week of 25 May, 1994, including the incorporation of tables, graphics, and mathematics symbols, as would be expected for a language then aimed at academic work.

HTML is designed to be as simple as possible. Each command consists of an opening tag in angle brackets, like <tag>, and a closing tag with an added slash, like </tag>. Some of the most common commands are listed below, together with the result displayed when the HTML is read by a web browser.

HTML

Web Page Result

The water is <b>very</b> blue.

The water is very blue.

The wind is <i>very</i> warm.

The wind is very warm.

It’s a small world
< img src=”http:twenty.netpic1.jpg”>

It’s a small world
Small planet earth

The main page is
< A HREF=”/index.htm”>
here
< /A>

The main page is here.

Every web page is written in HTML, which is text based, so it’s easily and quickly communicated across the Internet. On most browsers you can view the HTML for any web page you visit, including this one, by selecting “View Source” from the browser toolbar, or from a pop-up menu appearing when you right-click on a particular frame. When you are finished viewing the web page source you can close that window without affecting any of the pages you are viewing.

Dan Connolly, Jon Bosak, and others at the W3C have also developed a successor to SGML called the Extensible Markup Language (XML), which provides the structure to enable design of a range of languages like HTML for various purposes, and is in wide and increasing use.

Berners-Lee published a specification for HTML in RFC 1866, Hypertext Markup Language – 2.0, November 1995.

Generated HTML. There are several technologies used to dynamically create web pages on the fly whenever they are requested by a visitor, often by some variation on reading values from a database, for example to provide current weather or sports information. These methods often give their pages a different three-letter extension than “.htm”, although the page as it is transmitted over the Internet and received by your browser is still constructed in HTML so your browser can read it. Three leading technologies for web page generation are listed below:

  • .asp — Active server pages (ASP)
  • .cgi — Common gateway interface (CGI)
  • .php — Hypertext preprocessor.

These automated methods can produce powerfully interactive content, but have a key disadvantage: the data does not exist independently, it depends on the operation of a program which is much more complicated than data and can have many of its own dependencies. If the program fails to work perfectly, you often get no data or wrong data — perhaps all the temperatures are set to zero and every sports team is tied. A real example from 7 May 2003 can be found below — this message was returned instead of the expected forum web page:

http://searchengineforums.com/searchengine.forums/
action::thread/thread::1044843646/forum::Forum28/

searchengineforums.com: The Search Engines: Google:
Google Quick Start

There seems to have been a slight problem with the database. Please try again by pressing the refresh button in your browser.

Database error in Invalid SQL: SELECT thread FROM forum_stats WHERE forum=’Forum28′ AND last_post
mysql error: You have an error in your SQL syntax near ‘ORDER BY last_post DESC LIMIT 1’ at line 1

mysql error number: 1064

Date: Wednesday 07th of May 2003 05:32:11 PM

Resources. The following sites provide additional information about HTML: