Basic Computer Notions Introduction to HTML

Introduction

HTML = Hypertext Markup Language

HTML was invented (along with the World Wide Web itself) by Tim Berners-Lee (and Robert Cailliau) at CERN in 1990 (see A Little History of the World Wide Web)

The Web started to become popular with the release of the Mosaic browser (not the first graphical browser, but the first for Windows and Mac) by NCSA in 1993

The World Wide Web Consortium (W3C) controls the standard for HTML. The specification for HTML 4 was released 1997 December.

HTML is a specific markup language defined within the framework of SGML (Standard Generalized Markup Language). (SGML is very powerful but very complex and never became widespread.)

Separation of form and content

HTML reflects (or should reflect) meaning, not appearance. This facilitates

In any case, the exact appearance will differ from browser to browser and from screen to screen.

Tags

Use of tags in plain-text files:

Identify file contents as HTML

First line of file should define version of HTML - something like

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"
            "http://www.w3.org/TR/REC-html40/strict.dtd">

This is often omitted, but doing so can affect the appearance of the page in some browsers ref .

Elements, specified by tags, are organized hierarchically.

The entire file contents (excluding the DOCTYPE line) should be tagged as HTML:

<html>

 ...

</html>

It is recommended that the default text-processing language of the Web page be specified in the <html> tag, e.g.,
<html lang="en-CA"> for Canadian English.
This can help guide searching, text-to-speech output, etc.

Head and body

File should be divided into head and body:
<html>
  <head> 
    ...
  </head>
  <body>
    ...
  </body>
</html>

Head

The head contains information about the document.

The main tag to be used in the head is <title>. The title should be about half a line of text describing the nature of the document.

Most Web browsers will display it in a window separate from the document contents, and it may be used in automatically created indices.

<html>
  <head> 
    <title>Introduction to HTML</title>
  </head>
  <body>
    ...
  </body>
</html>

The head might also contain comments, delimited by <!-- and -->. One use for comments is to record the history of the document:

<html>
  <head> 
    <title>Test document</title>
<!--History
WRJF 1996 Apr 21 Created by WRJF
WRJF 1996 Apr 23 Corrected typos
-->
  </head>
  <body>
    ...
  </body>
</html>

Note that the indenting and line breaks are not necessary. The following would be equivalent:

<html> <head> <title>Test
document</title><!--History
WRJF 1996 Apr 21 Created by WRJF
WRJF 1996 Apr 23 Corrected typos--> </head> <body> ... </body></html>

The head may also contain meta tags, e.g:

<meta name="description" content="Tutorial on HTML.">
<meta name="keywords" content="HTML, history, tags, resources">
<meta http-equiv="Content-Language" content="en"/>
and link elements pointing to related files, e.g., a stylesheet:
<link rel=stylesheet type="text/css" href="../bacon.css">

Body

The body contains the document itself.

<html>
  ...
  <body>
    ...
  </body>
</html>

Paragraphs

When formatting a document for display, the Web browser will

Unless otherwise marked up, a whole page of carefully formatted text will be collapsed into one big paragraph.

To denote individual paragraphs, use the <p> tag.

The terminating </p> tag is not required in HTML (but is required in XHTML).

Do not use <p> to create empty space.

Emphasis

To emphasize some text, use the <em> tag.

To give really strong emphasis, use the <strong> tag.

For example, the text "The word <em>fish</em> is really <strong>important</strong>" will be displayed (in this browser) as
"The word fish is really important".

The appearance of emphasized text depends on the browser.

There are explicit tags for italics and bold but it is not a good idea to use them.

There are also several other tags for indicating the significance of text:

cite
citation or a reference (www.w3.org)
dfn
defining instance of the enclosed term (a cat is a small animal with soft fur which often keeps people as pets)
code
a fragment of computer code (print *, 'x=',x)
samp
sample output from programs, scripts, etc. (x= 5.)
kbd
text to be entered by the user (emacs test.txt)
var
an instance of a variable or program argument (x)
abbr
an abbreviated form (e.g.)
acronym
an acronym (HTML)
q
short quotation to be displayed in-line (meow) [the browser should add the quotation marks]
blockquote
long quotation to be displayed as a block (
meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow meow
)

The default appearance of text flagged in these ways will depend on the browser. The appearance can be modified using style sheets, which has been done for some of the examples above.

Headings

To indicate that a line of text is a top-level heading, use the <h1> tag.

For a subheading, use the <h2> tag, for a subsubheading use <h3>, etc., down to <h6>. The actual appearance will depend on the browser, and the lower-level subheadings may be indistinguishable or illegible or both.

For example, the headings above are specified by
<h1>Introduction to HTML</h1>
<h2>Headings</h2>

Lists

To create a simple list of items, use one of the following:

Use the tag <li> for each item in the list.

This is an example of an ordered list:

  1. cat
  2. dog
  3. flea
  4. fungus
  5. medical student

This is how the list was created:

<ol>
  <li> cat
  <li> dog
  <li> flea
  <li> fungus
  <li> medical student
</ol>

Note that in HTML the </li> tag is not required at the end of each item, because each item will automatically be terminated by the next <li> tag or by the end of the list. The </li> tag is required in XHTML.

A third type of list is the definition list, in which each list item consists of a term and a definition.

Here is an example of a definition list:

cat
owns the house
dog
lives in the house
medical student
feeds the cat

Here is how the list was defined:

<dl>
  <dt> cat
  <dd> owns the house
  <dt> dog
  <dd> lives in the house
  <dt> medical student
  <dd> feeds the cat
</dl>

Address

Every HTML file should include at the end a name and contact information for the person who created or is responsible for the file. This is indicated using the <address> tag.

For example,
<address>R. Funnell (R.Funnell@hades.he)</address>
would display

R. Funnell (R.Funnell@hades.he)

You may want to disguise your e-mail address to try to prevent harvesting by spammers. For example, you might use
R.Funnell_nospam@hades.he
or add space characters as in
R.Funnell @ hades.he

There are also fancier ways to disguise addresses.

Line break

To <br>, or not to <br>,
That is the question

The tag <br> can be used to force a line break in special circumstances.
It should not be used indiscriminately in place of the paragraph tag <p>,
and should not be used to create empty space.

This is an empty element, that is, there is no terminating </br> tag. In XHTML it would be written as <br/> to make clear that it is an empty element.

Horizontal rule

The tag <hr> can be used to display a horizontal line across the screen. It is often used to divide the screen into different parts.

This is another empty element, that is, there is no terminating </hr> tag.

Links

To create a link to another Web page, use the anchor tag <a> with the attribute href.

For example, this code creates a link to a file named testfile.html:
<a href="testfile.html">test</a>
The link is displayed like this: test

The value of the href attribute is interpreted as a uniform resource locator (URL). In this example, the URL consists of a relative address; since only the file name is specified, the target file is assumed to be located in the same directory as the current file.

To create a link to a file in another directory on the same computer, Unix file-system conventions are used to specify a path.

The path can be relative to the current directory. For example, the code
<a href="anatomy/ear.html">ear</a>
would link to the file ear.html in the anatomy subdirectory under the current directory.

A slash (/) character is used to specify the top-level directory of a computer.
A tilde (~) character indicates a username.

For example, the code
<a href="/~funnell/anatomy/ear.html">ear</a>
would link to the file ear.html in the anatomy subdirectory under Funnell's login directory.

To create a link to a file on another computer, an absolute URL must be used, specifying the name of the computer (//) as well as the location of the file on that computer. For example, the code
<a
href="http://audilab.bme.mcgill.ca/~funnell/index.html">
Funnell's home page</a>
creates the link Funnell's home page.

A specific location within a file can be specified as a target for links by using the anchor tag with the attribute name.

For example, the code
<a name=section2>Cats</a>
could be used to define an anchor at the beginning of Section 2 within the file gods.html.

A link directly to that section could then be defined as
<a href=gods.html#section2>link</a>

Links to Web pages normally include (or assume) the protocol specification http: (Hypertext Transfer Protocol). Links may also specify other protocols, such as ftp:, gopher: and mailto:.

It's possible to define the address as a link, either to jump to the author's home page or to send e-mail to the author. For example, <address>
<a href="http://audilab.bme.mcgill.ca/~funnell/">
R. Funnell</a>
</address>

or
<address>
<a href="mailto:r.funnell@hades.he">R. Funnell</a>
</address>

Use of mailto: means that the real e-mail address is exposed on the Web for spammers to find.

There are two special file-naming conventions which vary from server to server.

First, a URL which appears to refer to a user's login directory actually points to a specially named subdirectory (e.g., public_html/) of that login directory. This provides a convenient way of keeping publicly available Web pages separate from private files.

Second, if a URL specifies a directory but no file name, then the server first looks for a file with a special name (e.g., index.html or default.htm) in that directory. If it doesn't find such a file, it may present a listing of all of the files in the directory.

Images

To include an image in line in a Web page, use the <img> tag. This is an empty element, that is, there is no terminating </img> tag. The image to be included is specified by giving its URL as a src attribute.

For example, this code would include an image taken from a file living in the same directory as the current HTML file:
<img src="test.gif">

An image can also be specified by a complete URL.

For example, the code
<img src="http://audilab.bme.mcgill.ca/~funnell/mcr35.gif">
displays this image: McGill crest.

Image files come in many different formats, but not all formats can be used for in-line images. The most commonly used formats for this purpose are

In addition to the src attribute, the <img> tag also takes an alt attribute. This attribute specifies text which can be displayed by a nongraphical browser which can't display the image itself. It's recommended (required in HTML 4) that such an alternate text be included with every <img> element, for the benefit of users who can't (or don't want to) view graphics.

[McGill crest] For example, the code
<img src="crest.gif" alt="[McGill crest]">
would include an image with the alternate text '[McGill crest]'.

The align attribute can be used to control the image's position with respect to surrounding text:

bottom
The bottom of the image Box aligned to bottom should be vertically aligned with the current baseline. This is the default value.
middle
The centre of the image Box aligned to middle should be vertically aligned with the current baseline.
top
The top of the image Box aligned to top should be vertically aligned with the top of the current text line.
left
The image Box aligned to left should float to the current left margin;
text will wrap around image.
right
The image box aligned to right should float to the current right margin; text will wrap around image.

Links to images

In addition to using the <img> element to have the browser display an in-line image, one can also use the <a> element to link to a separate image.

For example, the code
<a href="pig01.gif">test</a>
will use the word test as a link to an image.

If the image being linked to is not in a format that the Web browser knows how to display, the browser can invoke a helper application to do the actual display, giving more flexibility and allowing the display of different image formats.

For example, the code
<a href="pig01.tif">test</a>
will use the word test as a link to an image which will be displayed by a helper application (if the browser is properly configured).

One can use the <img> tag to specify the contents of the <a> element, thus using one image as a link to a second image (or to anything else). This technique is often used to display a small in-line thumbnail image which can be clicked on to call up a larger version of the same image.

For example, the code
<a href="pig01.gif"><img src="box.gif"></a>
will use a thumbnail image as a link to an image.

Links to other media

The <a> tag can also be used to link to files containing animations, video clips, audio clips, VRML models, etc.

Your browser must either be able to handle the data itself, or be configured to call up either

Embedded objects

Multimedia objects of various types can be embedded directly within a Web page using the object tag (which is more ‘correct’) or the embed tag (which is actually more likely to work with a variety of different browsers).

Tables

The <table> element permits the layout of text, images, etc., in rows and columns.

For example, this code:

    <table border align=center>
      <tr>
	<td>row 1,column 1
	<td>row 1,column 2
      <tr>
	<td colspan=2 align=center>row 2
    </table>
gives this table:
row 1,column 1 row 1,column 2
row 2

The use of tables for fancy layout can be carried to absurd lengths. Such Web pages are hard to maintain, and are increasingly likely to behave badly in some browsers.

Frames

Frames were originally seen as making it easier for users to navigate complex Web sites, but are used less and less now.

According to the W3C ‘Frames cause lots of problems for the web model, e.g. standard URLs can only point at a frameset or a frame. Links and/or the back button can generate strange results. They also cause troubles to people with disabilities and make editing more complex. Unfortunately frames are not tagged deprecated and some people continue to use them. We recommend people to not use them.’ [ref]

Summary of elements

The following represents the hierarchical structure of some of the principal HTML elements. The asterisks represent characters and character-like elements.

  • html
    • head
      • title
        • text
      • meta
        • text
      • link
        • text
    • body
      • h1, h2, ...
        • character-like
      • p
        • character-like
      • ol, ul
        • li
          • character-like
      • hr
      • address
        • text
The following are ‘character-like’ elements:
  • text
  • em, strong
    • text
  • a
    • text
  • img
  • br

Checking syntax

Various tools are available for checking the syntax of your HTML pages. For example, the following HTML-checking services are available:

See the Google category Computers Data Formats Markup Languages HTML Tools Validators and Lints

Learning more about HTML

There are many books on HTML available. For example (as of 1999 Apr):

Camelot lists

191books with HTML in the title, including 35for beginners, as of 1999 Apr 20
65 8 2003 Mar 22
218 31 2005 Mar 20

There is also a lot of on-line help available (and a lot of it carries really ugly ads). For example:

Style guides for creating Web pages

There are many on-line style guides available for Web-page design. For example:


Bacon home page

Valid HTML 4.01! (valid except for embed element) Valid CSS!

Creative Commons
License
This work is licensed under a Creative Commons License.
R. Funnell

Last modified: 2012-04-12 11:41:07