Introduction

Today, the internet is all-pervasive across our working, social and domestic lives, and it is sometimes hard to remember what the world was like before its introduction and near-universal use.

The internet is the global system of interconnected computer networks that uses the Internet Protocol Suite (TCP/IP) to communicate between networks and devices. It is a “network of networks” that consists of private, public, academic, business, and government networks linked by electronic, wireless, and optical networking technologies.

The internet grew out of research into computer networking carried out by the US Department of Defense during the 1960s, culminating in their first packet-switched system known as ARPANET. During the 1970s and 1980s ARPANET expanded, first across the USA, and then internationally.

In 1990 Tim (now Sir Tim) Berners-Lee, working at CERN, developed WorldWideWeb, the first web browser, and associated tools including, the most relevant to this article, HyperText Markup Language (HTML).

HTML is an example of a Markup Language; a text-encoding system consisting of a set of symbols inserted in a text document to control its structure, formatting, or the relationship between its parts. Markup is often used to control the display of the document or to enrich its content to facilitate automated processing.

In this article we’ll discuss the Markup languages HTML, XML and JSON, and how they relate to the internet.

HTML (HyperText Markup Language)

In the early 1980s, the idea that markup should focus on the structural aspects of a document and leave the visual presentation of that structure to the interpreter led to the creation of Standard Generalised Markup Language (SGML), based on earlier work at IBM. As a document markup language, SGML was originally designed to enable the sharing of machine-readable large-project documents in government, law, and industry.

As mentioned above, HTML was developed in 1990 by Sir Tim Berners-Lee at CERN as part of his WorldWideWeb implementation and he considered it to be an application of SGML. HTML is a markup language that web browsers use to interpret and compose text, images, and other material into visible or audible web pages. Since 1996, the HTML specifications have been maintained, with input from commercial software vendors, by the World Wide Web Consortium (W3C).

HTML markup consists of several key components, including those called tags (and their attributes), character-based data types, character references and entity references. HTML tags most commonly come in pairs like:

<h1> and </h1>

although some represent empty elements and so are unpaired, for example:

<img>

The first tag in such a pair is the start tag, and the second is the end tag (they are also called opening tags and closing tags).

HTML documents imply a structure of nested HTML elements. These are indicated in the document by the HTML tags. Tags may also enclose further tag markup between the start and end, including a mixture of tags and text. This indicates further (nested) elements, as children of the parent element. Another important component is the HTML document type declaration, which triggers standards mode rendering.

The following is an example of the classic “Hello, World!” program:

<!DOCTYPE html>
<html>
 <head>
    <title>This is a title</title>
  </head>

  <body>
    <div>
        <p>Hello world!</p>
    </div>
  </body>
</html>

The text between the < html > tags describes the web page, and the text between the < body > tags is the visible page content. The markup text < title > This is a title < /title > defines the browser page title shown on browser tabs and window titles and the tag < div > defines a division of the page used for easy styling. Between the < head > tags, a < meta > element can be used to define webpage metadata.

The Document Type Declaration < !DOCTYPE html > is for HTML5. If a declaration is not included, various browsers will revert to “quirks mode” for rendering.

XML (Extensible Markup Language)

XML is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The World Wide Web Consortium’s XML 1.0 Specification of 1998 and several other related specifications, all of them free open standards, define XML.

XML has come into common use for the interchange of data over the Internet. Hundreds of document formats using XML syntax have been developed. Many industry data standards, such as Health Level 7, OpenTravel Alliance, XBRL, MISMO, and National Information Exchange Model, are based on XML and the rich features of the XML schema specification. In publishing, Darwin Information Typing Architecture (DITA) is an XML industry data standard. XML is used extensively to underpin various publishing formats.

XML is an application profile of SGML (ISO 8879). The versatility of SGML for dynamic information display was understood by early digital media publishers in the late 1980s prior to the rise of the Internet. By the mid-1990s, some practitioners of SGML had gained experience with the then-new World Wide Web (see above), and believed that SGML offered solutions to some problems the Web was likely to face as it grew.

XML was compiled by a working group of eleven members, supported by a (roughly) 150-member Interest Group. The design was worked on through 1996 and 1997 and XML 1.0 became a W3C Recommendation on February 10, 1998.

In web applications, XML is used to store or transport data, while HTML is used to format and display the same data. XML data is self-describing so can be processed and displayed in a virtually limitless number of ways.

Here is an example of the way a bookstore’s catalogue could be stored in XML:

<?xml version="1.0" encoding="UTF-8"?>
<bookstore>

  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>

  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>

  <book category="web" cover="paperback">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>

</bookstore>

As you can see, each data element is bracketed by descriptive “tags” as in HTML but, in XML, these are user-definable to match the structure of the data being stored and transmitted. This makes for great flexibility in the design of data-handling applications. As in HTML, elements can be given “attributes” to further increase flexibility.

JSON (JavaScript Object Notation)

JSON is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute:value pairs and arrays (or other serialisable values). It is a common data format with diverse uses in electronic data interchange, including that of web applications with servers. Douglas Crockford originally specified the JSON format in the early 2000s.

JSON is commonly used for serialising and transmitting data over a network connection such as the internet. It is used primarily to transmit data between a server and a web app. For example, when a browser opens up a weather app, the browser makes an API request to the web server of the weather app. The web server sends back a response that is JSON. The browser then turns this JSON data into a human-readable web page. The browser uses HTML to define how the data, transmitted using JSON, is displayed on the webpage.

JSON represents structured data. This means it’s supposed to handle different types of data. Natively, JSON supports the following data types:

  • Strings. For example: “Test”, “Example”, “How are you?”
  • Numbers. For example: 1000, 3.141, 6.022e23
  • Booleans. For example: true or false
  • Null (absence of a value). For example: null
  • Arrays (lists that can contain any of the above). For example: [3, 0, 4], [“Apple”, 10, true, null]
  • Objects (comma-separated key-value pairs inside curly braces). For example: {“name”: “Matt”, “age”: 40, “married”: false}

Here is an example of a simple JSON data file. Each data item consists of an attribute:value pair. Data items can be nested between curly brackets:

{
            "type": "basket",
            "beans": 47,
            "apples": 7,
            "oranges": 23,
            "brand": "ConvertSimple",
            "ratio": 33.9,
            "fees": {
                        "cleaning": "$4.50",
                        "baking": "$27.30",
                        "commission": "$93.10"
            },
            "descriptors": ["clean", "fresh", "juicy", "delicious"]
}

The differences between XML, JSON and HTML

As you will see from the above, the fundamental difference between these three formats is that XML and JSON are used for the storage and transmission of data, while HTML is used to describe how that data is displayed. The main advantage of separating these two functions is that dynamically changing data can be accommodated without having to constantly modify the code used for data display.

In short, HTML is the primary building block of web development and is used to define the structure of a page. XML or JSON can transport data between servers and are often used alongside HTML or other applications.

HTML is almost universally used for web design and there is really no practical alternative. However, the obvious next question is: XML or JSON?

XML vs JSON

JSONXML
It is JavaScript Object NotationIt is Extensible Markup Language
It is based on JavaScript languageIt is derived from SGML
It is a way of representing objectsIt is a markup language and uses tag structure to represent data items
It does not provides any support for namespacesIt supports namespaces
It supports arraysIt doesn’t support arrays
Its files are very easy to read as compared to XMLIts documents are comparatively difficult to read and interpret
It doesn’t use end tagsIt has start and end tags
It is less secure than XMLIt is more secure than JSON
It doesn’t supports commentsIt supports comments
It supports only UTF-8 encodingIt supports various encoding

XML vs JSON (Pros & Cons)

JSON advantages

  • In most scenarios, JSON is undoubtedly easier to read in its expanded form than XML.
  • JSON can have a substantially lower character count, reducing the overhead in data transfers.
  • JSON is much easier to parse. But this is only relevant if one is writing a parser, which is not a common activity at this point.

XML advantages

  • XML technologies have universal standards which JSON lacks.
  • XML has a more robust data structure than JSON.
  • XML is better suited to combining information sets from different systems than JSON.
  • XML directly supports the extension of base types which JSON doesn’t.
  • Unlike JSON, XML is reviewed by a universal standards committee.
  • HTML is an XML grammar, especially HTML5, which by definition must be “well-formed” XML.
  • XML supports the inclusion of comments which JSON doesn’t.

Conclusion

I hope this article has gone some way towards explaining how JSON, XML and HTML all work together to enable sophisticated applications to run over the internet. Fundamentally, JSON and XML are alternative methods of storing and transferring data and HTML is a method for describing how this data should be displayed on the user’s device. Both JSON and XML have their advantages and disadvantages; JSON is less verbose than XML, but XML is more flexible and secure. Which is best depends on the requirements of the particular application. As an aside, when the data to be stored and/or transmitted is in the form of a document, such as a user manual, technical journal etc., one of the document-centric implementations of XML, for example DITA or DocBook, is usually the preferred choice.

For the last 20 years, we at DeltaXML have been working on technologies for the intelligent comparison and merging of XML and JSON data files. If your applications use these data formats and you need to compare different file versions, please review our range of products to see how we can help.