The pandemic has seen the public exposed to considerable data and statistics, and this has sparked the interest of many netizen projects and websites to make data more accessible. Daily updates from news channels can leave many feeling confused, but the release and publication of raw data is important for transparency and has set new standards for governments and public institutions. Now we have better access to data how can we make sense of the volume and velocity of changing data?
The value of understanding true XML and JSON change
An abundance of software comparison tools is available for datasets where simply line by line is suitable, but few tools can understand change in structured data like JSON or XML. Understanding how content or data is structured, requires more sophisticated algorithms and representation, allowing for accurate representation of actual change. This is important for use cases where you are receiving big data sets where change is impossible to represent without structurally aware analysis, and in the provision of data as a delta or set of changes rather than repeatedly issuing huge datasets.
Represent just the change
Pandemic data sets are usually very large as they often represent national scale information, with daily data sets for infection rates, cases, deaths, and vaccinations. This requires data analysts to compare data with their current model to establish what changed and update relevant data. This level of processing can be time consuming, reducing immediacy and accuracy of reporting with subsequent analysis and decision-making being delayed. A more appropriate and expedient approach would be to process a delta or difference file to compare the current data set with the previous being more accurate and updating your data model far more quickly.
Where change matters
Publishing a delta or difference file of changes makes more sense, and should perhaps be more widely adopted, so why hasn’t this been the case? Often the answer is that producing a reliable and accurate delta file of changes if not straightforward. There is no standard for representing change in CSV, and whilst JSON has patch it is deficient in many respects for this purpose. However, where data is structured, it can be done successfully and DeltaXML’s patented delta file has been doing this for several years with XML and JSON.
One of the significant benefits of publishing changes is being able to publish changes between any two arbitrary files, allowing change to be derived over extended ranges. An example would be the International Organization for Standardization (ISO) who make available the latest version of a standard with changes from a selected previous version marked up. DeltaXML technology provides the technology behind this, to allow changes to be marked up on demand.
Where does government data fit in?
The pandemic has made the Pandora’s box of government data transparent, open, accessible, and useable to netizens in every country. When change happens, it matters, and tools like those from DeltaXML are helping organisations around the world to be at the forefront of understanding change in structured content and data. There is responsibility, now more than ever, that data is not misrepresented or misunderstood, to ensure credibility and authority of data sources such as public institutions.
References
UK Government data source for daily data sets for infection rates, cases, deaths, and vaccinations – https://coronavirus.data.gov.uk/