XML is almost always misapplied







XML was invented in 1996. As soon as he had appeared, the possibilities of its application had already begun to be misunderstood, and for the purposes for which they tried to adapt him, he was not the best choice.



It would not be an exaggeration to say that the vast majority of XML schemas that I have ever seen were inappropriate or misuse of XML. Moreover, this use of XML testified to a fundamental misunderstanding of what XML is primarily about.



XML is a markup language. This is not a data format . In most XML schemes, this distinction was clearly not taken into account, confusing XML with the data format, which ultimately meant an error in the choice of XML itself, since in fact the data format was needed.



Without going into details, XML is best for annotating blocks of text with structure and metadata. If your main task is not to work with a block of text, the choice of XML is unlikely to be justified.



From this point of view, there is an easy way to check how well the XML schema is made. Take for example the document in the proposed schema and remove all tags and attributes from it. If there is no sense in what is left (or if an empty line remains), then either your schema is not built correctly, or you simply should not use XML.



Below I will give some of the most common examples of incorrectly constructed circuits.



<rot> <item name="name" value="John" /> <item name="city" value="London" /> </rot>
      
      





Here we see an example of an unreasonable and strange (albeit very widespread) attempt to express a simple key-value dictionary in XML. If you delete all tags and attributes, an empty line will remain. Essentially, this document is, no matter how absurd it may sound, a semantic annotation of an empty line.



 <root name="John" city="London" />
      
      





To make matters worse, we have here not just a semantic annotation of an empty string as an extravagant way of expressing a dictionary - this time the "dictionary" is directly encoded as attributes of the root element. Because of this, a given set of attribute names on an element becomes undefined and dynamic. Moreover, it is clear from here that all that the author really wanted to express was a simple key-value syntax, but instead he made an absolutely strange decision to use XML, forcing the use of a single empty element just as a prefix to use attribute syntax. And such schemes come across to me very often.



 <rot> <item key="name">John</item> <item key="city">London</item> </rot>
      
      





This is already something better, but now the keys are metadata for some reason, but the values ​​are not. A very strange look at dictionaries. If you delete all tags and attributes, half of the information will be lost.



The correct dictionary expression in XML will look something like this:



 <rot> <item> <key>Name</key> <value>John</value> </item> <item> <key>City</key> <value>London</value> </item> </rot>
      
      





But if people made the strange decision to use XML as a data format and then use it to organize the dictionary, then they should understand that what they are doing is inappropriate and not convenient. Still often, designers mistakenly choose XML to build their applications. But even more often, they exacerbate the situation by the senseless use of XML in one of the forms described above, ignoring the fact that XML is simply not suitable for this.



Worst XML Schema? By the way, the prize for the worst XML schema I have ever seen gets the format of the automatic resource allocation configuration file for Polycom IP telephony phones. Such files require loading XML request files via TFTP, which ... In general, here is an excerpt from one such file:



 <softkey softkey.feature.directories="0" softkey.feature.buddies="0" softkey.feature.forward="0" softkey.feature.meetnow="0" softkey.feature.redial="1" softkey.feature.search="1" softkey.1.enable="1" softkey.1.use.idle="1" softkey.1.label="Foo" softkey.1.insert="1" softkey.1.action="..." softkey.2.enable="1" softkey.2.use.idle="1" softkey.2.label="Bar" softkey.2.insert="2" softkey.2.action="..." />
      
      





This is not some bad joke. And this is not my invention:





Documents or data . From time to time, someone does absolutely strange things, trying to compare XML and JSON - and thereby showing that he does not understand either one or the other. XML is a document markup language. JSON is a structured data format, so comparing it with each other is like trying to compare warm to soft.



To understand this, the concept of the difference between documents and data will help. As an analogue of XML, you can arbitrarily take a machine-readable document. Although it is intended to be read by a machine, it relates metaphorically to documents, and from this point of view it is actually comparable to PDF documents, which are most often not machine-readable.



For example, in XML, the order of the elements matters. And in JSON, the order of the key-value pairs inside the objects does not make sense and is not defined. If you want to get an unordered dictionary from key-value pairs, the actual order in which the items in this file follow does not matter. But you can form many different documents from this data, because the document has a certain order. Metaphorically, this is an analogue of a document on paper, although it does not have physical dimensions, unlike a printout or a PDF file.



In my example of the correct representation of the dictionary in XML, the order of the elements in the dictionary is shown, in contrast to the representation in the JSON language. I cannot ignore this order: this linearity is inherent to the document model and XML format. When interpreting this XML document, someone may decide to ignore the order, but it makes no sense to argue about this, since this issue goes beyond discussing the format itself. Moreover, if you make a document viewable in a browser by attaching a cascading style sheet to it, you can see that the elements of the dictionary follow in a certain order, and in no other way.



In other words, a dictionary (a fragment of structured data) can be converted into n different possible documents (in XML, PDF, on paper, etc.), where n is the number of possible combinations of elements in the dictionary, and we have not yet taken into account the others possible variables.



However, it also follows from this that if you want to transmit data alone, then using a machine-readable document for this will not be effective. It uses a model, which in this case is superfluous, it will only interfere. In addition, in order to extract the source data, it will be necessary to write a program. It hardly makes sense to use XML for something that at a certain stage will not be formatted as a document (say, using CSS or XSLT, or both), since this is the main (if not the only) reason for that to stick to the document model.



Moreover, since XML does not have the concept of numbers (or Boolean expressions, or other data types), all numbers represented in this format are considered only additional text. To extract the data, the scheme and its relationship with the corresponding expressed data must be known. It is also necessary to know when, based on the context, one or another element of the text is a number, and it should be converted to a number, etc.



Thus, the process of extracting data from XML documents is not so different from the process of recognizing scanned documents containing, for example, tables that form many pages of numerical data. Yes, in principle it is possible to do this, but this is not the most optimal way, unless in an extreme case, when there are no other options at all. A smart decision would be to simply find a digital copy of the original data that is not embedded in the document model, in which the data is combined with their specific textual representation.



However, it doesn’t surprise me at all that XML is popular in business. The reason for this is precisely because the format of documents (on paper) is understandable and familiar to business, and they want to continue to use the familiar and understandable model there. For the same reason, in business too often use documents in PDF instead of more convenient for machine processing formats - because they are still tied to the concept of a printed page with a certain physical size. This applies even to documents that are unlikely to ever be printed (for example, a PDF file of registry documentation of 8,000 pages). From this point of view, the use of XML in business is essentially a manifestation of skeuomorphism. People understand the metaphorical idea of ​​a printed page of a limited size, and they understand how to create business processes based on printed documents. If this is your guideline, documents without a limited physical size that are machine-readable - XML ​​documents - are an innovation, while being a familiar and comfortable analogue of a document. Which does not prevent them from remaining an incorrect and overly skeuomorphic way of presenting data.



To date, the only XML schemas I know that I can truly call the proper use of this format are XHTML and DocBook.



All Articles