Towards a Resource Oriented Future

From Seven
Jump to: navigation, search

The talk "Towards a Resource Oriented Future" was given on 2011-10-27 at the OGC Interoperability Day at SmartKorea 2011 in Seoul, Korea.

Contents

Summary

This talk introduces several Hypes that live in the context of SDI. After detailing some of the better known acronyms they are put into an architectural context and related to the Resource Oriented Architecture (ROA). In the last section ROA concepts are mapped to current Web and Internet technologies and a perspective of the evolution of SDI is given.

SDI

Spatial Data on the Internet

In more recent definitions of SDI it translates into "Spatial Data on the Internet". The main reason for this slight shift in perspective is that there is no reason to try and define a separate infrastructure for spatial data if a perfectly well organized infrastructre is already in place (the Internet and the Web).

Legacy Definitions

The legacy definition of a Spatial Data Infrastructure (SDI) is an infrastructure to provide interactively connected access to spatial data and metadata using software tools. Modern SDIs are typically implemented as interoperable Web services which are designed as resources. In the resource oriented paradigm the interaction occurs by exchanging stateless representations of the resources.

In broader definitions an SDI as the sum of the technology, policies, standards, human resources, and related activities necessary to acquire, process, distribute, use, maintain, and preserve spatial data. Kuhn (2005) defines an SDI is a coordinated series of agreements on technology standards, institutional arrangements, and policies that enable the discovery and use of geospatial information by users and for purposes other than those it was created for.

Application of SDI Design

SOA

SOA is the acronym for "Service oriented Architecture", a term nowadays mostly associated with message oriented architectures based on OASIS and W3C standards. This type of architecture was originally designed to populate the Web but ended up mostly inside corporate strctures. Nowadays SOA are associated with a high potential of vendor-lock-in and major interoperability problems. SOA are only a periferal aspect of the Web running on the Internet.

As an architectural concept the SOA still has validity but as a current implementation using the associated technology SOAP it is on the decline.

SOAP

SOAP was originally designed to become the "Simple Object Access Protocol". With the emergence of the Internet and HTTP as the common protocol of the Web SOAP was typically only used on HTTP. The major critique of SOAP is that is misuses HTTP as a transport layer and ignores the error codes designed for the Web (200=OK, 404=not found, etc.). Instead of using HTTP in the way it was intended SOAP uses it only as a means to transport messages and data in separately bundled packages. Additionally SOAP is message oriented and invites to implement Remote Procedue Calls (RPC). These are by definition not "bad" but they are not suitable for the broader Web running on the Internet.

Adoption of SOAP

Many specialized realms (think "highly specialized types of calculation") have tended to associate Web Services with technologies such as SOAP, WSDL, UDDI and the many 'WS-*' standards. However, it is a common misunderstanding that these technologies are necessary to create Web Services.

SOAP technology could theoretically be used to implement Resource Oriented Architecture, especially since the new SOAP Version 1.2 was released. But SOAP makes it intrinsically easier to implement Web Services that break the core principles of the ROA.

Decline of SOAP Usage

In the meanwhile the Web has grown directly on HTTP without the need for a spearate wrapper nowadays making SOAP largely obsolete.

ESB

The "Enterprise Service Bus" is primarily a buzz word used by larger corporations intending to lure customers into their own infrastructure design. The ESB is often based on SOAP technology and comes with the meanwhile falsified promise of high interoperability. Even although it is only a buzz word it still manages to manifest itself as a core component of the emerging Europen Spatial Data Infrastructure INSPIRE. Wherever the term "ESB" is used within the context of INSPIRE it can safely be exchanged by Internet.

ROA and RESTful

This is a short introduction to the Resource Oriented Architecture (ROA)

It is recommended to read The Hierarchy and the Graph for a less technical introduction to the topic.

Web Services

Web services are software implementations that make resources available on the Internet (or World Wide Web). The resource can be any type of data from a simple text, HTML page, XML document, image or even raw data. Human beings (or rather their software) can to fetch, parse (read) and use this data for presentation but also for calculation. Read/write web services allow a user (or rather their software) to modify the data on the web server, in many cases contained in a database. Complex Web Services can perform highly specialized types of calculation gearde towards specific domains.

A more strict interpretation of the concept Web Service defines it is a Web resource that has been designed for programmable access. It allows other machines (the clients) to access and use its features in a well-defined manner.

In the early days of the Internet so called Screen Scraper software was used to harvest information from web sites that were implemented in a way that only humans could "read" them. This software used the concept behind HTML which is a well known and structured syntax to extract information and reuse in other contexts. This made the Web itself one big de facto Web Service. But this is a notoriously brittle solution, and therefore unsatisfactory. What distinguishes a Web Service from a Web Site is that it is purposely designed for access by client software.

ROA

The case for Web Services designed according to a Resource-Oriented Architecture (ROA) was presented by Leonard Richardson and Sam Ruby in their book RESTful Web Services (2007, O'Reilly Media Inc.).

The fundamental idea is that the basic, well-understood, and well-known technologies of the current web (HTTP, URI and XML) should be used according to their design principles. This facilitates the design of Web Services that have simple and coherent interfaces, and which are easy to use and maintain. Such web services will also be easier to optimize for working with the existing infrastructure of the web. The design principles of the web technologies are summarized by Roy Fielding's notion of Representational State Transfer (REST) in his Ph.D. thesis Architectural Styles and the Design of Network-based Software Architectures (2000).

The Resource-Oriented Architecture (ROA) consists of four concepts:

  1. Resources
  2. Their names (URIs)
  3. Their representations
  4. The links between them (What an irony that the Wikipedia article on the ROA is tagged as an "orphan" as few or no other articles link to it).

and four properties:

  1. Addressability
  2. Statelessness
  3. Connectedness
  4. A uniform interface

Websites should be designed following these concepts and properties. All URLs (the correct term in other contexts is Uniform Resource Identifier (or URI)) should be designed for clarity and consistency. The representations (HTML and XML) should be well formed and structured. Especially HTML allows for many exceptions and is frequently being "misused" to achieve design features using structuring elements. The design principle to follow here is to separate form and content. All representations should be strongly linked. It should always be possible to navigate through the representation of all entities just by following links and this should be true for both the HTML and XML representations.

One important benefit of the ROA is a low barrier of entry. Both using Web Services provided by others from the client side and implementing the server side (exposing resources to others) should need as little overhead as possible. If many different, special-purpose technologies and standards need to be mastered, then the overhead in terms of time and effort may well become prohibitive. Since ROA uses only the well-known basic technologies of the web, it is likely to be much easier to use than the SOAP approach, which has its own unique and extensive technology stack.

Isomorphic representations allow a potential user to grasp the design of the programmatic web service simply by browsing the human web site. By noting the URLs and looking at the HTML and corresponding XML documents, it becomes much simpler to write a client program to automate the analysis the user has in mind.

The Resource Perspective

The two concepts 'Resource' and 'Service' are related but come from different levels of perspective. Resources have representations for example HTML documents. These representations get transmitted from the server to the client. The server does not have to "keep track" of the client as it will always receive a completely self contained representation back. The client does not have to know whether the server did everything correctly (imagine that the Internet connection had a hickup) and can be sure that whatever it does will always result in exactly one action of the server and not destroy what was previously sent (this concept is called Idempotence). The client does not need to keep track of which instance of a server it has talked to. This allows for highly scalable architectures.

...to be continued

Architecture Models: RM-ODP View Points

The presentation gives a short introduction to the five View Points of the "Reference Model for Open Distributed Computing" (Wikipedia Link) and how they relate to the ROA.

The Internet and the Web

Some common misconception of what the Internet and the Web represent is explained in more detail in the blog: The Hierarchy and the Graph

An image showing the connectedness of the Internet

The Internet (English Wikipedia) is a Tree. It is the global physical information network of the inhabitants (humanity) of planet earth. The Web resides on the Internet.

Current technology of the Internet is based on the Internet Suite (or TCP-IP). The foundation of the Internet is the IP address system, a numerical numbering system identifying nodes of the network. Each node can have a tree of information which is maintained only on that node and typically not referenced elsewhere automatically. From each branch of this tree links can point to other locations on the Internet. This structure of relations is commonly called the Web. It is a directed graph residing on the Internet trees. It is not directly searchable, mainly due to lacking computing power and connectivity in comparison to the dimension of the Internet (it is too big). This deficit makes it difficult to search and find things on the Internet. To tackle this issue a series of undertakings permanently index the Web and implement highly sophisticated search algorithms to make the data explorable (Google being one of the larger commercial entities working on this).

Internet accessibility is a requirement to be able to use the Web.

For a better understanding of the relation of the directed graph (the Web) and the hierarchically organized network (the Internet) it is recommended to read Arnulf's blog on The Hierarchy and the Graph.

An image showing a minute fraction of the Web around Wikipedia

The Web is a Graph. It is one representation of the data accessible through the global information network of the inhabitants (humanity) of planet earth - the Internet. Current technology is based on HTTP for transfer and HTML (plus numerous other formats) for content. The Web is described in the English Wikipedia as a directed graph of information and data. It consists of a practically unlimited number of often independently maintained directed relationships of subjects and objects. The subject is typically a web page, the relation is a link and the referenced object another web page, document, information, or data.

Internet accessibility is a requirement to be able to use the Web.

For a better understanding of the relation of the directed graph (the Web) and the hierarchically organized network (the Internet) it is recommended to read Arnulf's blog on the The Hierarchy and the Graph.


RESTful, REST, Linked Open Data, ROA

...and how it all might come together.

TBC