Maybe I'm not objective because I'm Dutch myself, but from both a user-facing and technical perspective I think the Dutch dashboard is by far the best corona dashboard in the world. It's very fast, has a lot of detailed visualizations, provides a lot of context and has fair amount of accessibility features.
The Dutch website seems to spend a lot of that time running the Next JS framework stuff, which the Gov.uk variant does not. It might work quickly on fast computers, but even on modern phones it seems to visibly pause.
https://dvhn.nl/groningen/Meer-ziekenhuispati%C3%ABnten-blij...
When the hospitals feel like it, they test patients that are already in the hospital for something else if they have COVID. And when they don’t feel like it, they don’t. Any patient found to have COVID, is added to the graph. So these numbers, and also derived numbers such as the R value, are statistically useless and vulnerable to manipulation.
Some example queries issued by the dashboard: https://github.com/publichealthengland/coronavirus-dashboard... https://github.com/publichealthengland/coronavirus-dashboard... https://github.com/publichealthengland/coronavirus-dashboard...
“At the time of writing, the Citus distributed database cluster adopted by the team on Azure is HA-enabled for high availability and has 12 worker nodes with a combined total of 192 vCores, ~1.5 TB of memory, and 24 TB of storage. (The Citus coordinator node has 64 vCores, 256 GB of memory, and 1 TB of storage.)”
That’s beyond overkill for something that as you say could be generated statically a couple of times a day.
E.g. 12 worker nodes and 192 vCores means they've picked 16 core nodes. 1.5TB of memory across 12 nodes means 128GB per node. 24TB of storage is just 2TB per node.
So it's 12 relatively mid sized servers/VMs.
They could certainly do it with much less, and I have no interest in looking up what 12 nodes of that spec would cost on Azure, but at Hetzner it'd cost less than 1500 GBP/month including substantial egress. At most cloud providers the bandwidth bill for this likely swamps the instance cost, and the developer cost to develop this is likely many times the lifetime projected hosting cost even with that much overkill.
If they happen to have someone familiar with query caching and CDNs, I'm sure they could cut it significantly very quickly, and even an entirely average developer could figure out how to trim that significantly over time. But even at (low) UK government contract rates it's not worth much time to try to trim a bill like that much vs. just picking whatever the developers who worked on it preferred.
That would require actual work instead of selling an overpriced generic solution.
As for using the setup for other things, that seems less likely given this expensive setup.
Hell, let's do some partial evaluation: just bake the computed HTML into the source code and recompile that a few times a day. No need to even read from a file when you can fetch it from rodata.
As for the reason why they did it this way, I assume it's a combination of CV-driven development along with the hackernoon-reading-junior-engineer-meets-cunning-salesperson effect which others have noted.
Alternatively, we're building https://www.polyscale.ai/ that is a good fit for this type of use case. It's a global database cache and integrates with Postgres/MySQL etc. We host PoP's globally so the database reads are offset and local to users.
Agree with the other comments in that this feels like a shiny use case to quote to other prospects, but all good :)
My guess is that this was web people who were contracted to build a read-only daily updated dashboard instead of interactive web app so they treated it as another web app, just scaled up.
I built a one-pager vanilla JS site that polls the official Johns Hopkins aggregated data daily, and displays dynamically generated smoothed moving average charts, performs curve similarity analysis to identify similar patterns in different countries, and performs logarithmic regression to depict current doubling/halving times.
This happens entirely on the client side, with no server side component whatsoever (other than the http server to deliver the static HTML&JS that does all the work). See https://covid-19-charts.net/
On the other side, I have the feeling that this thing that clearly over-engineered. Just look at data their diagram... If I'm not wrong there is one writer and multiple reader for the data, or at least multiple writers on one side and multiple readers on another side, without a need for "real time" consistency.
So, this thing could probably have been better splitted to not have the use for "scaled" databases
The article states it was written by Claire Giordano from San Francisco. Not sure where you got the UK Government official from.
To me it read like a b2b marketing piece and showcase. Kind of: We can power this, so we can power your BI dashboard as well.
Taking this into account it was a nice write up and from a data analyst's and consultant's pov interesting to read.
<<As a result, the GOV.UK Coronavirus dashboard became one of the most visited public service websites in the United Kingdom.>>
You don't expect the gov UK dashboard to be done us consultants...
Maybe, I'm naive, or not cynical enough, but I just read this as a case study of customer using Azure to provide the general public with information in a robust fashion.
In fact, if anything, the whole article is remarkably light on pushing Azure, and quite heavy on architecture details.
The open source code (on Github) uses Postgres (not MSSQL), and Python (not C# or Powershell), and in fact has a screen shot of Jetbrain's Pycharm, and not VSCode.
In fact it's probably quite an MS agnostic article.
Even though gov.uk is actually a really good IT company, I'm quite pleased that they're using "the cloud" rather than trying to create their own.
For anyone who's wondering, the relevant team here is GDS[0]. We hired a bunch of engineers from there at one of my previous companies - which was doing some quite gnarly technical work - and they were superb. I believe the US equivalent is 18F.
[0] The Government Digital Service in full, but no definite article for the initialism.
[1]: https://github.com/citusdata/citus
[2]: https://blogs.microsoft.com/blog/2019/01/24/microsoft-acquir...
Also - I’ve been really impressed by the openness of the team actually doing the work - eg threads like https://twitter.com/pouriaaa/status/1476892793729654787
and in particular this analysis of debugging a problem that the dashboard encountered - which also gives a lot more background context: https://dev.to/xenatisch/cascade-of-doom-jit-and-how-a-postg...
I think the page looks inoffensive but is clearly focussed on being informative. I wish more data repositories took care and attention towards how data is represented.
https://insidegovuk.blog.gov.uk
The only downside is that they often send you to sites run by other, significantly less competent bodies (looking at you, student loans company).
All the fancy tech to get in your pockets. For everything else, go f*k yourself.
Most recently I found the DVLA license renewal was one of those ugly backwaters (albeit still fully online), but their license check code generator is great.
For real terrible stuff, check out local council websites.
I do think the UK and some other countries do a better job of presenting data compared to the CDC.
It's pretty much agreed that the rate of unvaccinated people vs. vaccinated people winding up in hospital beds is several times higher, however, all the CDC data presented is only rates. I want tallies or counts, and I cannot find them. For instance, on Ontario, Candada's site[1], the vaccinated are 74% vs. the unvaccinated's 26% of COVID hospitalizations. Most non-technical people think the hospitalizations of COVID patients is like over 90%. It's because more and more people are vaccinated, even with a lower rate of hospitalizations, the numbers are higher. Also, it's interesting to see on the Ontario site that COVID hospitalizations consist of 56% directly for COVID, and 44% were admitted for other reasons and then tested positive for COVID once hospitalized. The case is more telling for ICU with 81% admitted for COVID, and 19% for other reasons.
I am trying to play with raw data more for refreshing my munging skills than making a point or fodder to add to the COVID noise. I have been coding since 1978, played with neural nets, GAs, and GP in the late 1980s, but I don't code or do data analysis for a living right now (other than buisness strategy reports that require some basic analysis). There's a lot of data out there, and it can get very confusing. I am back to using R/RStudio from a brief stint using Julia/Pluto notebooks and previously using Python/Jupyter notebooks. I even did a toy DSEIR model in J back in April 2020 based on previous work by a couple of people, which I plan on updating to April[2]. I am going to try and do some Lisp work, and I think I will settle on RStudio and Lisp for more genomic/bioinformatic stuff (yes, I know biolisp has been supplanted by python, however, Lisp is having a renaissance in symbolic-related areas of ML again like NLP). BTW, in what language was GPT implemented, not API languages, but what PL(s) was used to create the code - C++, Java, Go?
I may be bad at navigating the CDC website, but I can't seem to get the dataset of numbers of hospitalizations by vaccination status, only rates or pre-filtered data. I do remember downloading raw data that seemed to have it (over 1.8gb, I think), but I can't seem to find it. I'd appreciate a link if anyone has it.
[1] https://covid-19.ontario.ca/data/hospitalizations#hospitaliz...
Joking aside, I liked the description of the dashboard, and generally speaking the UK's government Web sites are better quality, support open data more, are easier to read and navigate than other European countries from what I have seen. This includes this dashboard, which looks clean, simple and functional.
I was waiting for the big SQL Server advertising language and positively suprised that the article is very tech agnostic. I did all seem to be rather over-engineered, but Microsoft needs to make some money and government agencies don't generally have wizards from HN working for them, so I can live with an occasionally over-engineered system as long as important systems are working and remain up.
The most mysterious part for me was why one would put JSON inside relational tables?
Cheap and easy way to permit a flexible schema for some part of the data. Performance tests probably showed that for their specific query workload, any slow down from parsing/lack of index was fine.
So while there's nothing wrong here with calling an MIT project open souce, it's not inconsistent with their own definition, and useable as propaganda.
[0] https://azure.microsoft.com/en-gb/services/developer-tools/d...
>Is Azure Data Studio open source?
>Yes, the source code for Azure Data Studio and its data providers is open source and available on GitHub. The source code for the front-end Azure Data Studio, which is based on Microsoft Visual Studio Code, is available under an end-user license agreement that provides rights to modify and use the software, but not to redistribute it or host it in a cloud service. The source code for the data providers is available under the MIT license.
This has been an argument for at least 25 years that I've been around this stuff.
Microsoft is defining their product, which you can't redistribute[0], as "Open Source".
[0] https://github.com/microsoft/azuredatastudio/blob/4012f26976...
According to Dominic Cummings (ex-adviser to the PM), this isn't true at all - one of their biggest failings early on was to not have the data and not see the priority in getting it.[1]
[1] https://news.sky.com/story/dominic-cummings-hearing-the-insi... : He added later that there was no data system at that point, and he needed to use his iphone as a calculator to make predictions about the extent to which infections would spread, which he then wrote down on a white board.
Tracking key health metrics and sharing those metrics with the public doesn't mean that there is modelling about the extent to which infections would spread - although we also know that the imperial modelling was released a day later, so while he may have been using his iPhone to make predictions there were also academic teams modelling this that were collaborating with the government at the time (see https://www.imperial.ac.uk/news/196234/covid-19-imperial-res...).
It's also not clear what a 'data system' is in this context - there was clearly an effort to very quickly put something in place to capture data (because it couldn't wait a few weeks/months), but a more robust analytics system will inevitably take more than a few weeks to put in place if not already in place pre-pandemic (a lot of this is about how NHS trusts are structured in the UK, which operate fairly independently). It's not clear to me how quickly is realistic to implement what Dom thought was suitable in terms of a 'data system', particularly as I'm not particularly clear on his requirements (he seems to want an element of forecasting built into this system for instance?), so without knowing what the requirements are can we be confident that what he wanted to build was possible to build, test and implement in his expected timeline?
So I don't think there is a clear contradiction here (and in fact, I think the evidence points to the fact that the statement in the article is probably correct).
> "and to share those metrics with the public" Given stats on how many patients were in ICU didn't officially exist so you couldnt request them with an FOI request I'll let you work out how true this is. They wanted to control the data to craft a narrative to justify the report9 claims we'd have plague bodies in the streets because this is the end of days...
In early March 2020, they briefly announced that the daily COVID figures would move to a weekly cadence.
When locking down, they were still flying blind, and after that (during Hancock's 100k tests a day moonshot) there were leaks that "figures were being compiled in a notebook by calling round different labs".
On the data side, they have ~7.5 billion total records and they add in 55 million new a day. On the web side, they have ~1 million daily unique users and 100k concurrent users at peak ("concurrent" means "in one minute" is seems).
I'm no expert on the web part, but I'm kind of curious why they went with the design they did for the data part. The design, and the chosen technologies make me think they treated it more like a normal web app, not like a dashboard. I would expect OLAP database, not a sharded Postgres, and the data model feels very OLTP to me as well. Or maybe is that because it's mostly time series and not traditional data model?
I'll have to go through the article in more detail.
OLTP stores are relatively bad at aggregating across a lot of data.
Analytics dashboards with many users, a lot of ever-changing data, and many different views exist in a gray area between OLAP and OLTP often referred to as real-time analytics or operational analytics. The queries are usually somewhat lighter / less ad-hoc / more indexed than in OLAP, but there can be hundreds or thousands of them per second with different filters and aggregations.
There are some specialized real-time analytics databases like Druid. Citus (used in the article) allows you to run such workloads at scale on PostgreSQL.
I love it when websites have a simple text version.
[1] https://coronavirus.data.gov.uk/easy_read [2] https://coronavirus.data.gov.uk/
https://github.com/publichealthengland/coronavirus-dashboard...
It surprises me how much more popular F# is in Europe compared to the US. I finally got a professional F# gig in the states (\o/), but there were very few options. It makes me wonder, are universities in Europe providing a more functional-first approach to CS education, or is something else going on?
I'm all for Citus, but cmon. Overkill.
The information is presented clearly and it's easy to see what's going on, although in my case the main reason is the breakdown for Argyll & Bute, which isn't a focus area for the national ones!
Typical this turned into a pro cloud puff piece that frankly shows a serious amount of over design for what should be a data filtering/processing step to any reasonable "data scientist". And if I'm having to say a data scientist could do it better you know you got it wrong...
monsters?
As a programmer, I want to make demands about UX too, for a change ...
But then this would let you perform statistical analyses on _their_ data, I'm not sure they're such a big fan of that...
> awarded to Microsoft
Hey Europe, want to stop being several decades behind in IT compared to US/China?
One simple trick:
Ban FAANG from public procurement in Europe!
It‘s a no-brainer really.
Buy locally, ideally giving small companies and startups a chance.
You will have to do it anyway very soon if you want your privacy laws to be taken seriously.
There might be a couple of months of friction while buerocrats have to find new procurement partners, but that's it.
And then the European tech scene will rise.
I don’t disagree with you, but if I were the CTO of a 10000+ seat organization, and a Microsoft/Google/etc told me they could provide email, storage, sharing/collaboration, office apps, security etc for a few bucks per user / month… that’s a pretty compelling deal.
Is it any better if SAP / Telekom get similar contracts (what usually happens in Germany)?
> Ban FAANG
I know exactly what you mean, but is that what we're doing now? Including Microsoft in FAANG but not changing the acronym?
It's built in-house by the government.
Those small companies and startups are going to end up using Microsoft/Amazon/Google for their hosting/cloud-services anyway so FAANG still win in the end.
We'll no doubt award yet more govt projects to the tech oligarchs of the west and praise students for using their toys...