An Introduction to Web ServicesBy Enrique Castro | Print
Re-Imagining Linux Platforms to Meet the Needs of Cloud Service Providers
Intel's Enrique Castro looks at the history of the World Wide Web and how its development impacts Web services today.
From the first Web to Web services
The first Web was implemented in 1989 by Tim Berners-Lee, at that time a researcher at CERN, an acronym in French for the European Center for Nuclear Research in Geneva, Switzerland. It was initially conceived as a method to link research documents together through the Internet. These links came in the form of Universal Resource Locators, or URLs. The syntax of these links is familiar: URLs are displayed in the "Address" window in a browser like Microsoft Explorer.
Documents in the Web have hyperlinks, document "hot spots" usually displayed in blue font by the browser. Behind a hyperlink is an URL. A browser "understands" the syntax of an URL and is capable of mapping the URL to a TCP/IP (Transmission Control Protocol/Internet Protocol) address of the target computer in the Internet. The documents and the links form a network literally spanning the globe. Hence the moniker "World Wide Web."
The adoption of the Web took a meteoric leap after Marc Andreessen in 1993, at that time an undergraduate student at the National Center for Supercomputing Applications (NCSA) at the University of Illinois, built a browser with a graphical interface, and dubbed Mosaic. The Web came of age, left its research and academic roots, and was embraced by a broad audience comprising a significant fraction of the world's population.
The popularity of the Web encouraged people and organizations to post more content, which in turn made the Web more attractive to content consumers. This is Metcalfe's law at its best. This feeding frenzy was one of the engines behind the Internet boom in the late 90s.
None but the simplest Web sites (the target of a URL) dole out static Web pages any more. Web servers have become fronts for sophisticated applications. These applications may themselves be distributed, perhaps under the classic three-tier architecture: the Web server in the front, a mid-tier running the application or business logic, and a back end storing and managing the data.
"Web pages" have become a figment of the implementer's imagination. A user can interact with a Web-enabled application through a Web browser. In a typical interaction, the server presents a form to the client built on the fly to gather user information; it then extracts the data from the form and re-packs it for consumption by the application. The output of the application is formatted on the fly and presented back to the browser for perusal by the consumer. One very common instance of this process takes place when a customer checks a bank account balance.
The "magic" and the synergy of the Web have been facilitated by the open standards for communication between the server front ends and clients: the Hypertext Transfer Protocol (HTTP) running on top of TCP/IP exchanging Hypertext Markup Language (HTML)-formatted content. Any client, running on any hardware platform, with any operating system is capable of interacting with any server, running on any hardware platform with any operating system.It is not unusual to have old and crusty applications running on mainframes re-engineered, i.e., Web-enabled through a new front-end. Retooling existing applications for the Web is a strong medicine for extending the life of these applications and deferring investment, but these benefits do not come without side effects. Behind the attractive Web façade there may be significant ugliness from force-fitting applications that are not designed to play nice with each other. This force-fitting takes the form of significant programming effort using integration brokers. (For additional background please refer to Paul Krill's article in InfoWorld, "Driving the Enterprise Service Bus.")
In spite of the increasing sophistication in the applications behind a Web site, the greatest advantage of the Web, making an enormous amount of content available for human consumption, is also a hindrance when it comes to machine-to-machine interactions. Machine-to-machine interactions over the Web are desirable to automate certain repetitive business-to-business e-commerce transactions. Web protocols facilitate inter-company interactions over the Internet. Having these interactions program-driven relieves humans of re-keying the same information in a browser over and over again.
Technologies exist today to facilitate machine-to-machine, business-to-business exchanges, such as Electronic Data Interchange (EDI). EDI is relatively expensive to set up and maintain, requiring, in many cases, private networks (VANs or Value-Added Networks), and is not a good match for ad-hoc, casual transactions.
The desire to do machine-to-machine interactions through the Web with the same facility as humans led to Web services. Unfortunately, the same traits that make the Web a good fit for human consumption make it very difficult to use with computers. In principle it's possible to write a screen-scraping program to replace a human interacting with a Web application but in practice it's difficult to achieve. The HTML language intermixes formatting constructs with nuggets of data, making the parsing unduly difficult. The program would need to be updated every time the content of a Web site was updated.
A more definitive solution lies, not in writing a sophisticated parser, but in expanding to encompass machine-to-machine interactions, the principles that made the Web successful for human browsing. The most basic standards for Web services are XML for data (data and formatting constructs are separate), SOAP as wrapper for conveying data, WSDL for describing services and UDDI for finding them. They can be mapped to browser-Web constructs as explained in the XML Journalarticle mentioned above.
Web services introduce a powerful abstraction mechanism: "Composability" or a capability to build compound applications recursively out of more elementary building blocks through a standardized protocol paradigm similar to the one that fueled the first Web. The distinction between an application domain behind the front end, and a user domain between client computers and a Web site's front end, vanishes. The same protocols to communicate with a client can be used to link applications together behind the front end. From an architectural perspective, this state of affairs is esthetically pleasing because there is no distinction between a client and a server anymore. An application is implemented by a set of computers collaborating over the Internet. This confers a software architect enormous expressivity on how computations are apportioned to the nodes (computers) in a distributed computing network.
The architecture of the browsing Web dictates that clients be used almost exclusively as a display engine. In contrast, under Web services, if a powerful client is available, an application can be designed to do more computations locally to minimize bandwidth or to allow work to continue in the presence of an intermittent connection. The same application, when launched from a handheld device, could defer some heavy computation and display functions to the server without need of extreme measures such as creating a special markup language.
Two main frameworks that exist today for deploying Web services-based applications are Sun Microsystems J2EE (Java 2 platform, Enterprise Edition) and Microsoft.Net. It is impossible to do justice to this subject in one or two paragraphs beyond gross generalizations.
Because of interoperability requirements, the two frameworks should exhibit similar semantic behaviors at the Web services invocation interface, but different performance behaviors. The .Net environment supports development in multiple languages, but currently there is one target runtime environment, the Common Language Runtime (CLR) running under the Windows family of operating systems. An Open Source development effort of parts of .Net is in progress. On the other hand, a number of Java Virtual Machines (JVMs) are available from different companies hosted in a variety of operating systems and hardware platforms. Development is usually done with the Java language. Microsoft offers a sophisticated development environment, Visual Studio .Net. Development environments are available from a number of software houses for building Java-based applications, perhaps not as sophisticated, but still very practical.
Both environments provide the means for building Web services wrappers for legacy applications.
Let's digress for the moment from recent history and go back to the beginning of electronic computing. We will see that within this context, Web services technology represents a logical evolution of dynamics that started sixty years ago. Furthermore, it is possible to gain insight into the evolution of the information industry from the observation of dynamics playing out in other industries; effectively taking us further back, to the times of the Renaissance.A strong undercurrent in Web services is the phenomenon of deferred binding. "Binding" refers to actions for coalescing the components of a system to enable the system to become operational. There are two essential factors associated with binding: Timing and process. These are the point during design or manufacturing where binding takes place, and the methods used to facilitate binding. The two factors feed on each other. Early binding is characteristic of nascent technologies or industries. Changes in the product are not possible after binding has taken place. As technology matures, processes are re-designed to allow binding to occur later and later in the product life cycle. Doing so allows more flexibility in how a product is deployed and reduces waste.
Let's go through a few examples to clarify these concepts. In the book publishing industry, "binding" literally means sewing the books together. In the days before the printing press, the only way to make a book was to literally hand copy it. Ordering a book was a long and arduous process that started with the raw materials, the fibers for making the paper and the pigments to make ink.
Printing press technology allowed the deferral of some of the decisions: making printing plates was time consuming; however, additional copies of a book could be made at a relatively low cost and on short notice from a set of plates already made.
Printing remained a centralized operation for five centuries. Books were bound at a central printing facility. Because of the early binding, books had to go through a long supply chain from warehouse to warehouse to a retail store before they actually reached the consumer. Using this system has an impact on business agility. Decisions about printing runs need to be done weeks or months in advance. Overestimation of demand results led to unsold stock that had to be placed at a loss. Underestimation of demand led to missed revenue opportunities because it took a while to replenish the pipeline.
The advent of computers and print-on-demand (POD) technology has made it technically possible to defer binding to the very last minute before purchasing. Books can be printed at a bookstore in runs as small as one unit or downloaded to an electronic device.
Deferred binding was also pioneered with newspaper publishing. For example, Britain's The Sunday Times claims it is the first paper to be published in multi-section format, including a Sunday magazine. This innovation brought additional flexibility in the publishing process through reduction of dependencies in the composition process of single-section newspapers. Web services technology uses the same paradigm: loosely coupled software components like the sections of a newspaper make up a compound application. Without deferred binding enabled by digital printing, the nation-wide distribution of newspapers like the Wall Street Journal, USA Today or The New York Timescould not happen on a timely basis. Deferred binding also allows customization for local markets by allowing the insertion of sections with local content or with advertising targeted to specific markets.
The history of programming reflects the same progression from early to late binding that was observed in the printing industry. However, the transition happened much faster: it took half a century instead of half a millennium. Deferred binding is also associated with higher levels of abstraction where the physical nature of the machine gets pushed further down the solution stack.
Programming early computers like ENIAC was not programming in a contemporary sense. Until stored program computers were developed (the Von Neumann architecture), programming involved physically routing data through the machine by the manipulation of dozens of cables and thousands of switches.
Another manifestation of early binding is application-specific computers: some machines were made for scientific and engineering calculations, while others were designed to run commercial applications. Each machine had a different instruction set and hardware design.
The first machines with stored programs were programmed in binary, a very tedious and repetitive process, but at least it made it unnecessary to re-build the machine for every application.
It did not take long for programmers to discover that it was a lot easier to write programs assigning a mnemonic symbol for each computer instruction. This process led to assembly language programming. Running an application program through another program, the assembler, bound mnemonic symbols to the actual numeric codes for the instructions. Assemblers evolved into macroassemblers; a symbol was assigned to a sequence of instructions. This sequence of instructions would fulfill a specific task and appear over and over again. It was easy to assign a name to that sequence. The symbol would get expanded by the assembler and replaced by the appropriate instructions. The macro feature took programming (i.e., delayed binding) two steps away from the target executable binary.With the invention of high-level languages in the late 50s, binding was delayed a couple more steps: a program was written in a high-level language such as COBOL, Fortran or PL/I. Another program (the compiler) was used to translate high-level language constructs into assembly language.
Separate compilation was yet another step in the quest for deferred binding. Programmers discovered that certain routines representing the code necessary to do a particular calculation, such as a trigonometric function, could be stashed away and invoked as necessary. Compilers were enhanced to support separate compilation, allowing references to other programs to be left unresolved. A second program, called a linker was used after the compiler to resolve these references. It would take object files produced by separately compiling source files and string them together in a single executable program.
Some routines, such as the square root or trigonometric functions appeared so frequently, that they were offered as librariesby the computer vendor. The programmer did not have to code them from scratch; just link them from a specific library.
At this point binding took place at the source code level. The program would have to be recompiled and re-linked after a change in one of the constituent components. Systems were fairly brittle; very small changes in a source file could cause the program to fail and software distribution in binary form was not practical.
Binding was pushed one step back when the ability to perform it at run time was attained in the mid 80s and early 90s with "componentized" objects. This capability was supported first locally through dynamically linked libraries (.so or shared objects in SunOS and .dll files in Microsoft Windows). The capability was eventually supported for distributed objects as well with technologies like COM/DCOM/COM+ ([Distributed] Component Object Model) in Windows-based systems and Common Object Request Broker Architecture (CORBA) in the Unix systems.
One last barrier remained: this binding could only occur within a single software "universe," e.g., a computing system had to be all Microsoft Windows-based using COM+, or all Unix using CORBA. "Cross-universe" systems could be built, albeit fairly expensively, using proprietary connectors, each job being one-of-a-kind.
The last barrier has been removed with Web services: the combination of XML messaging plus an open standards communication infrastructure makes it possible for any computer to talk to any other computer in a meaningful way, independent of architecture.
The focus for binding has been changing over the years. In the 70s it meant putting together a number of object files to build an executable program. In Web services, binding means associating an application domain to specific protocols.For instance, SOAP-wrapped XML can be conveyed over HTTP or over SMTP (Simple Mail Transfer Protocol) with semantically identical results.
The benefits from deferred binding are quite practical and not theoretical at all, the benefits derived from infrastructure re-use and the ability to work at higher layers of abstraction. While in theory it would be possible to build any piece of functionality in assembly language, in much the same way someone could cut a tree and build a book, in practice it would take a long time and at great expense. Contemporary requirements for business agility make this approach impractical much in the same way hand made automobiles or firearms would be today.
Assessing the future impact of Web services is at best an uncertain exercise from today's vantage point. There will be technical consequences, but perhaps the most profound will be its effect on society and business.What will be the effect on IT workers, especially those on developed economies who are concerned about offshoring? Will a 90% reduction in the cost of Enterprise Application Integration projects lead to more unemployment? Or will businesses decide to tackle more ambitious projects with the extra productivity? At the same time, as the technical hurdles to attain a certain level of functionality are diminished, comparatively, business considerations rise in importance. This means programmers and IT workers will need to get more involved with the business aspects and business consequences of their tasks. The need to build on business skills is a frequent theme in trade journals today. Jobs that require business skills and internal company business knowledge are much harder to outsource.
Web services may affect the rate at which companies merge. The gargantuan task and expense of combining IT departments can act as a deterrent today. Web services can eliminate this deterrent and accelerate the rate at which companies merge.
The first Web, as discussed in this article, was text based. Even URLs are text strings representing some TCP/IP address. It is a data-based Web, with data contained in Web pages linked through URLs. URLs are all-or-nothing: either they point to another page, or are just ordinary text. Under Web services, data is exchanged in XML format separate from display annotations, and the Web is generalized to a set of service points, the computers providing specific Web services, but the text paradigm does not change.
This text-centric behavior is exemplified in the way a search engine is used: queries for a search are done by entering a sub-string in the search engine. This method is not very accurate: it produces a large amount of false positives, i.e., results that do not match what the user was looking for, and also false negatives, useful items that are not shown. The basic issue is a lot more "meaning" in a search than can be captured in a string. In the text-based Web, the meaning is in the brain of the user. For instance, a user looking for information on pony cars may actually enter "Ford Mustang" in the search engine in a roundabout way to get to the information. Entering a query on "pony cars" will likely yield a large number of matches on ponies and cars that are irrelevant.
The Semantic Web supports the notion of Resource Description Frameworks (RDF) to represent ontologies, the elements in a specific universe of discourse. Under the Semantic Web it is now possible to capture the attributes of these elements and how they relate to each other. The semantics or meanings of the universe of discourse are represented through these relationships.
These capabilities could lead to more accurate Web search engines. For instance a search engine could accept a hint to restrict the universe of discourse to cars. The user can now safely enter the keyword "pony" without being deluged with matches on horse breeds. Since the search engine "knows" about relationships, it could color key the results indicating the likelihood of a match to the criteria provided by the user. The results are no longer all-or-nothing.
These customized searches could be resource intensive in terms of execution time and storage consumption. Complex queries could be vectored to a clustered computation server using Grid Computing protocols. Hence the adoption of the Semantic Web can potentially impact the development of other technologies.
The example above uses the browser-based Web paradigm. The Semantic Web is in an early stage of development and subject to intense research. Research is also taking place on the machine-to-machine aspects of the Semantic Web, i.e., on Semantic Web services. Semantic Web services would make possible very efficient manipulation of data over the Internet, but also business processes.
The table below illustrates the relationships between the First Web, the Semantic Web, Web services and Semantic Web services. The First Web has been firmly adopted, Web services are in the initial adoption phase; the Semantic Web is still under research.
For a more detailed introduction on fundamental Web services protocols, please refer to the XML Journal article Business Systems with Web Services . For a more extended treatment on some of the issues raised in this article please refer to the WebServices.org article Web Services Readiness .
Enrique Castro is a senior architecture planning and technology strategy architect and consultant with Intel® Solution Services.
This document and related materials and information are provided "as is" with no warranties, express or implied, including but not limited to any implied warranty of merchantability, fitness for a particular purpose, non-infringement of intellectual property rights, or any warranty otherwise arising out of any proposal, specification, or sample. Intel assumes no responsibility for any errors contained in this document and has no liabilities or obligations for any damages arising from or in connection with the use of this document.
Intel is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries. Other names and brands may be claimed as the property of others.
Copyright (c) 2003 Intel Corporation. All rights reserved.