Chapter 1. Introduction

The Internet's history cover nearly 50 years from its born until our days. Its an interesting story and its evolution contains important milestones at least every decade. The last decades have seen considerable technological advances in the this sector. The current stage, the IoT (Intenet of Things), is far-far away from its initial version which was prepared by the invention of the telegraph, telephone, radio, and computer. The initial goal was clear, a connection was required between machines which forms a communications network that could exist even if parts of it was incapacitated. The story begun in 1966 at DARPA (originally ARPA which states for Advanced Research Projects Agency which is changed to DARPA as Defense Advanced Research Projects Agency in 1971). They created the ARPANET, the first packet switching network for host-to-host communication. ARPANET was funded by the United States military after the cold war with the aim of having a military command and control center that could withstand nuclear attack. The point was to distribute information between geographically dispersed computers. ARPANET created a communications standard (Network Control Protocol (NCP)), which defines the basics for the data transfer on the Internet today.

This network was the Internet’s forerunner before the public version was appeared in 1969. The original ARPANET grew into the Internet. The Internet embodies a key underlying technical idea, namely that of open architecture networking. In this approach, the choice of any individual network technology was not dictated by a particular network architecture but rather could be selected freely by a provider and made to interwork with the other networks through a meta-level "Internetworking Architecture". In an open-architecture network, the individual networks may be separately designed and developed and each may have its own unique interface which it may offer to users and/or other providers. Think about wired and wireless network solutions to get an imagine of it. The original communication standard, NCP did not have the ability to address other solution than the original ARPANET, so it need to be replaced. The new protocol would eventually be called the Transmission Control Protocol/Internet Protocol (TCP/IP) and appeared in 1972. However, the widespread presence of the Internet is dated into the mid of the '80s when the presence of PCs and workstations are started growing.

A major shift occurred as a result of the increase in scale of the Internet and its associated management issues. To make it easy for people to use the network, hosts were assigned names, so that it was not necessary to remember the numeric addresses. The DNS (Domain Name System) permitted a scalable distributed mechanism for resolving hierarchical host names into an Internet address. The increase in the size of the Internet also challenged the capabilities of the routers. New approaches for address aggregation, in particular classless inter-domain routing (CIDR), have been introduced to control the size of router tables. Nowadays, after thirty years, there are still several researches for making these algorithms much more better, reliable and faster.

An other important piece in this picture is the role of documentation which established a series of notes for proposals and ideas. That was the RFC (Request for Comments) which is the way till nowadays to share feedback between researchers. The key its free and open access nature, all the specification and protocol documents are easily accessibly to everybody. The method is still using its original concept just the way of the publication is changed. At first the RFCs were printed on paper and distributed via snail mail. As the File Transfer Protocol (FTP) came into use, the RFCs were prepared as online files and accessed via FTP. Now, of course, the RFCs are easily accessed via the World Wide Web.

In the last three decade there are several organizations and work groups were appeared to help the standardization of the Internet. No longer was DARPA the only major player in the funding of the Internet. This evolution could be seen in the following figure (from the www.internetsociety.org website):

Standardization of the Internet

Internet Tools and Services

The Internet covers large, international Wide Area Networks (WAN’s) as well as smaller Local Area Networks (LAN’s) and individual computers connected to the Internet worldwide. The Internet supports communication and sharing of data, and offers vast amount of information through a variety of services and tools. The major Internet tools and services are:

Electronic mail, most commonly referred to as email or e-mail since ca. 1993, is a method of exchanging digital messages from an author to one or more recipientsE-mail clients allow you to send and receive electronic mail messages. To use e-mail on the Internet, you must first have access to the Internet and an e-mail account set up (mostly free of charge) that provides you with an e-mail address. Valid e-mail address consists of a username and a domain name separated by the @ sign.

An email message consists of three components: the message envelope, the message header, and the message body. The message header contains control information, including, minimally, an originator's email address and one or more recipient addresses. Usually descriptive information is also added, such as a subject header field and a message submission date/time stamp. Network-based email was initially exchanged on the ARPANET in extensions to the File Transfer Protocol (FTP), but is now carried by the Simple Mail Transfer Protocol (SMTP), first published as Internet standard 10 (RFC 821) in 1982. In the process of transporting email messages between systems, SMTP communicates delivery parameters using a message envelope separate from the message (header and body) itself.

Newsgroups are often arranged into hierarchies, theoretically making it simpler to find related groups. The term top-level hierarchy refers to the hierarchy defined by the prefix before the first dot. The most commonly known hierarchies are the Usenet hierarchies. Usenet is a news exchange service similar to electronic bulletin boards. Usenet is older than the Internet, but the two are commonly associated with one another since most Usenet traffic travels over the Internet. A Usenet newsgroup is a repository usually within the Usenet system, for messages posted from many users in different locations. The term may be confusing to some, because it is in fact a discussion group. In recent years, this form of open discussion on the Internet has lost considerable ground to browser-accessible forums and social networks such as Facebook or Twitter.

Internet Relay Chat (IRC) allows you to pass messages back and forth to other IRC users in real time, as you would on a citizens' band (CB) radio. It is mainly designed for group communication in discussion forums, called channels, but also allows one-to-one communication via private message as well as chat and data transfer. IRC is an open protocol that uses TCP. An IRC server can connect to other IRC servers to expand the IRC network. Users access IRC networks by connecting a client to a server. The standard structure of a network of IRC servers is a tree. Messages are routed along only necessary branches of the tree.

Telnet allows you to log into another computer system and use that system's resources just as if they were your own. Telnet was developed in 1969 beginning with RFC 15, extended in RFC 854, and standardized as Internet Engineering Task Force (IETF) Internet Standard STD 8, one of the first Internet standards. However, because of serious security issues when using Telnet over an open network such as the Internet, its use for this purpose has waned significantly in favor of SSH (Secure Shell). SSH uses public-key cryptography to authenticate the remote computer and allow it to authenticate the user.

File Transfer Protocol (FTP) is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and uses separate control and data connections between the client and the server. FTP users may authenticate themselves using a clear-text sign-in protocol, normally in the form of a username and password, but can connect anonymously if the server is configured to allow it. For secure transmission that hides (encrypts) the username and password, and encrypts the content, FTP is often secured with SSL/TLS ("FTPS"). SSH File Transfer Protocol ("SFTP") is sometimes also used instead, but is technologically different and based on the SSH-2 protocol.

The World Wide Web, usually referred to simply as the Web, is a solution for displaying, formatting and accessing multimedia information over a network such as the Internet. It is a system of interlinked hypertext documents which allow related subjects to be presented together without regard to the locations of the subject matter. Hyperlinks function as pointers to information, whether the information is located within one website or at any site throughout the world. A website is a set of files residing on a computer (usually called a server or a host). Web sites do not have to be connected to the Internet. Many organizations create internal Web sites to enhance education, communications and collaboration within their own organizations. You access the site with software called a Web browser which displays the files as "pages" on your screen. The pages can contain files of text, graphics, sounds, animation, interactive forms-almost any form of multimedia-and they can be downloaded to your computer. Webpages are written in Hyper Text Markup Language (HTML).

Recently, the Web has become the predominant form of Internet communication (with the exception of e-mail), far outstripping the use of other systems such as gophers, newsgroups or ftp sites. It is already becoming a significant factor in many organizations' approaches to internal and external communications and marketing. The Web provides an immensely popular and accessible way to publish electronically, offer services or simply express your creativity.

The Web hides all of the underlying technology from the user. When you access a webpage, your browser locates and brings you the data. You do not have to worry about where the information is located, and the browser manages all storage, retrieval and navigation tasks automatically. The Web can handle many forms of Internet communication, such as FTP, Gopher and Newsgroups and Usenet, replacing the need for many other tools for using the Internet.

However, the story does not end in here. The Web continuously changing and new technologies are emerging. The next biggest invention is the Semantic Web which is currently just a little bit more than a vision. The technologies are existing but the implementation is partial. It it became reality the Web become one of the most important service of the Internet.

Summary - How does the Internet work?

If we need to conclude this section, we can say that it starts with protocols and finish in architectures. The most dominant parts are listed in the following section:

The Internet is a packet-switching network that uses TCP/IP as its core protocol. TCP/IP is a suite of protocols that govern network addresses and the organization and packaging of the information to be sent over the Internet:

An IP address is a unique address assigned to each computer connected to the Internet. It is used by TCP/IP to route packets of information from a sender to a location on the Internet. IP address consist of four sets of numbers ranging from 0 to 255. As we mentioned earlier, its hard to remember if we use several locations on the Internet. Domain Name System (DNS) allows the use of easier to remember domain names instead of IP addresses to locate computers on the Internet. Domain Name Resolvers scattered across the Internet translate domain names into IP addresses.

Domain names have two parts: the first part names the host computer while the second part identifies the top level domain. Top level domains (TLD) are identifying the type of host. It could be a generic Top Level Domain, like

or a Country Code Top Level Domain, like .hu for Hungary.

All the other protocols are responsible for a given application and resides in a higher level of the IP stack. The most important protocols:

This list shows that the Internet is serving all the major functionality what a user needs. We can imagine from this short introduction that the field covered by the title of this subject is far more greater than a book could be. The main focus is put on the HTTP part an the related technologies. We need to underline this is not limited only serving static HTML documents. Its far-far beyond the original goal of the Web. In the remaining part of this book we will discuss the story of the Web, the provided services and the supporting technologies and theoretical background.