Chapter 5
The Internet and Applications

 5.1 The Internet
 5.2 Clients, Servers and Addressing
  5.2.1 Addresses and Ports
  5.2.2 Servers
  5.2.3 Clients
 5.3 Web Browsing
  5.3.1 HTTP Operation
  5.3.2 Web Browsing on the Command Line
 5.4 Remote Login

File: nsl/apps.tex, r1669

This chapter provides background information on the Internet and common applications used in the Internet. If you have already studied an introductory networking subject, then most likely there is nothing new in this chapter for you. It serves mainly as reference, i.e. if you forget some concepts mentioned in later chapters, then refer back to this chapter.

5.1 The Internet

To be completed (e.g. IP, TCP, forwarding, routing, addresses). In the meantime, see introductory networking textbooks.

5.2 Clients, Servers and Addressing

Most network applications, including web browsing, email and file downloads, are implemented as client/server applications. For example, web browsing involves a web browser (client) retrieving web pages from a web server. The client/server model involves the server listening for new connections and the client initiating new connections. (A connection is usually needed each time we perform some operation, e.g. transfer a file, download a web page, send an email). We use IP addresses, as well as ports, to uniquely identify each connection.

5.2.1 Addresses and Ports

We know that IP addresses are used to identify computers on the Internet. This includes clients and servers. When sending data between a client and server, the source and destination IP addresses are carried in the IP datagram (see Figure A.1). These two addresses (source and destination) uniquely identify the connection between these two computers.

But what about different application programs (or processes) running on the computers? If you have one web browser connecting to a web server at www.google.com and a second web browser connected also to www.google.com, then how does your computer know which IP datagrams are destined for which instance of the web browser?

Client/server applications also use port numbers to identify connections between applications. Your first web browser instance uses a different port number than your second web browser instance. So in fact all communications between client/server applications can be uniquely identified by both the source/destination IP addresses and the source/destination port numbers:

For example, connection 1 between browser 1 and web server www.google.com:

Source IP
203.131.209.77
Destination IP
66.249.89.99
Source Port
47984
Destination Port
80

And connection 2 between browser 2 and www.google.com:

Source IP
203.131.209.77
Destination IP
66.249.89.99
Source Port
48032
Destination Port
80

Note that the two connections between the same computers are uniquely identified, because the source ports are different.

While the source and destination IP addresses are carried in an IP datagram header, the source and destination ports are carried in the TCP (or UDP) packet header (see Figures A.2 and A.3). Therefore every packet we send over the Internet has these four addresses. (A fifth identifier, the protocol number is also included in the IP datagram. For example, if TCP is the transport protocol being used, the protocol number field in the IP header has the value 6, representing Transmission Control Protocol (TCP). For a list of common protocol numbers see Appendix A.2.)

5.2.2 Servers

The common structure of most network server applications is as follows:

  1. The server is idle, listening (or waiting) for connection from clients on a well known port.
  2. When a server receives and accepts a connection request (e.g. TCP SYN), it creates a child process to communicate with the client. The child process exchanges data with the client. When the exchange is finished, the child process is deleted, leaving only the original parent server process.
  3. The server returns to the idle state (step 1).

In this way, a server can typically handle many connections at a time. For example, the www.google.com web server can handle connections from 1000’s of client hosts at a time. An important aspect is a well known port. Since the client initiates the connection, it has to know what is the destination IP address and port number. The client can find the servers IP address through Domain Name System (DNS) (e.g. www.google.com maps to 66.249.89.99). It knows the port number because most common servers use a well known port number. Some commonly used well known port numbers are listed in Appendix A.2.

5.2.3 Clients

The common structure of most network client applications is as follows:

  1. Send a connection request to a server. The client (in fact, the operating system) chooses an unused port number as the source port, and sends the connection request to the server.
  2. Once connected with the server, the client and server exchange data.

So multiple instances (or processes) of one application can communicate at the same time—they just use different source port numbers.

5.3 Web Browsing

Everyone knows how to use a web browser. But what about a web server? Chapter 12 shows you how to install, configure and use a common web server called Apache. And how does a web browser communication with a web server? Using HyperText Transfer Protocol (HTTP). Section 5.3.1 provides background information on HTTP. As we primarily use the command line in this book, a graphical web browser like Firefox, Safari or Edge is not available. Therefore Section 5.3.2 illustrates two command line tools for web browsing.

5.3.1 HTTP Operation

To be completed. For now, consult a networking textbook on the operation of HTTP.

5.3.2 Web Browsing on the Command Line

When testing a web server it is useful to have a web browser. Similarly, creating HTTP traffic is useful for testing networks, learning about protocols and performing security operations. However on the command line we do not have direct access to graphical web browsers such as Firefox or Safari. Therefore we have two options: use a command line program for web browsing, or use tunnelling to run a graphical web browser on another computer. Here we show how to do the former; the latter is demonstrated in Section 5.3.2.

lynx is a text-based web browser available on Linux. Pass in the URL of the web page you want to visit, e.g. http://192.168.2.21, when you start:

$ lynx URL

Once open, you can browse pages using your keyboard. Of course images will not be displayed, and JavaScript is not executed. But you can view basic HTML pages. A quick guide to using Lynx:

Lynx provides an interactive web browser. If you only want to download a page (without interactively following links) then you can use wget. In it’s simplest form, wget downloads a page at a requested URL:

$ wget URL

The page is saved as a file on your computer. This can be useful for testing and automating tasks in scripts (see Chapter 6).

Video: Linux Command Review: wget, ssh, nc (10 min; Aug 2016)

5.4 Remote Login

Secure shell (ssh) is a protocol for securely logging in to another computer. It is a replacement for telnet (which was insecure). OpenSSH is a free implementation of a SSH client and server. Both client and server should be installed on the Ubuntu computers.

Secure shell can be run from the command line using:

$ ssh DESTINATION

where DESTINATION is the IP address or domain name of the computer you want to connect to.

Optionally, you can include the USERNAME to log in as (otherwise it will default to the current username in use on the client):

$ ssh DESTINATION -l USERNAME

You will be prompted for the password of that user on the server. (The first time you log in you may also be prompted about unknown authentication—enter Yes to continue).

Once you have logged in, you can run commands on the server. That is, it is the same as if you are using the command line on the server.

You can log out using the exit command.