It is referenced hundreds of times in many classic papers.
But, here's the thing. It doesn't exist.
Everyone cites Sarin, DeWitt & Rosenb[e|u]rg's paper but none have ever seen it. I've emailed dozens of academics, libraries, and archives - none of them have a copy.
So it blows my mind that something so influential is, effectively, a myth.
Sunil Sarin, Mark DeWitt, and Ronni Rosenberg, "Overview of SHARD: A System for Highly Available Replicated Data," Technical Report 162, Xerox Advanced Information Technology (May 1988).
EDIT:
OK, I think I get this now. I had read the Wikipedia blurb about CCA being acquired by Rocket earlier, but only just now did I keep reading further down to find this bit:
in 1984, CCA was purchased by Crowntek, a Toronto-based company.[8] Crowntek sold Computer Corporation of America's Advanced Information Technology division to Xerox Corporation in 1988.[9] The balance of CCA was acquired by Rocket Software, a Boston-based developer of enterprise infrastructure products,[2] in April 2010.
So it seems like the portion of CCA that would be of interest here, is probably the bit that sent to Xerox. Maybe somebody at Xerox could help turn up the missing document?
I doubt it will help, but I took a stab at pinging them on <strike>Twitter</strike> X.
I've gone ahead and sent them each a message asking if they might be able to make the paper available.
If you'd like to get in contact with them yourself and are having trouble finding their LinkedIn, shoot me an email and I'll be happy to provide you links.
> Yes, I was involved, 35 years ago! I believe it was an internal CCA paper. I don't have a copy and I have no idea how to get it. Sorry about that. It does seem to be the earliest reference to "shard" in the DB context. (The other early reference pointed to in Wikipedia is from much later, 1997.)
> Fortunately, you need not go back 35 years to read about sharding; it's easy to get current info. Cheers.
I've now sent a message to Andy Youniss, CEO of Rocket Software to see if he can help.
Of course if nobody can get their hands on a particular work, as seems to be the case here, that makes things kind of hard. But I'd expect most works you need to cite in a fast-moving field such as CS to be available at least somewhere, even if it takes a bit of effort.
It's kind of wild that LLMs and other models/sequences will be able to quickly suss out which papers have high levels of referential integrity.
https://shkspr.mobi/blog/2021/06/where-is-the-original-overv...
According to Google Scholar, it's cited a measly 11 times .
https://scholar.google.com/scholar?cites=1491448744595502026...
I've found that Google Scholar's coverage gets a little sketchy on older stuff, and since we're talking way back in the 1980's here, I don't think it's surprising that some things are missing.
Now I’m going to be bugged by this too! Great trivia also a heck of a mystery
> John, I'll have to look in my hard copy archives, which are very disorganized and will take me some time. But I don't believe you need this exact paper. I can probably point you at other papers that were in fact published (and are not just company TRs).
> Furthermore, our concoction of the acronym SHARD is different from "database sharding" as currently used. We were referring to the network, not the data, being "sharded", i.e., partitioned, and ensuring high database availability (with some loss of consistency - see "The CAP Theorem").
Mark Dewitt shared:
> Hi John, I believe that was an unpublished report we wrote for our DARPA funding agency. I may still have a copy, but it's currently buried under some stuff in the garage. There is at least one published paper that describes the SHARD architecture for a partially replicated database. It's not entirely obvious because I think the SHARD acronym didn't start getting used until after the paper was published in 1985. By 1987, Sunil Sarin had published another paper with Nancy Lynch in which they refer to SHARD and reference the 1985 paper.
> Here are the two citations:
> S. K. Sarin and N. A. Lynch, "Discarding Obsolete Information in a Replicated Database System," in IEEE Transactions on Software Engineering, vol. SE-13, no. 1, pp. 39-47, Jan. 1987, doi: 10.1109/TSE.1987.232564.
> S. K. Sarin, B. T. Blaustein and C. W. Kaufman, "System architecture for partition-tolerant distributed databases," in IEEE Transactions on Computers, vol. C-34, no. 12, pp. 1158-1163, Dec. 1985, doi: 10.1109/TC.1985.6312213.
Wait, you mean people include papers they haven't even opened in their references?!
On a semi-related topic, I love mysteries like these - mystery songs, those Japanese kanji in Unicode that nobody knows what they mean or where they came from, paper towns on maps.
If anyone else has anything else to read along similar lines, please post it!
So a documented reference to sharding that's earlier than that would be interesting to see.
(Disagree? Instead of downvoting, consider posting a citation that actually resolves to a real paper.)
"SHARD" is the name of the software - it was common back then to name systems using acronyms. It's not clear whether the paper/report actually uses the term "shard" in the sense that it is now used in distributed systems, or even whether it uses it at all.
1. Not available online doesn't mean the paper's existence is made up. It's a very bold claim to make for the authors that they cite work that is fabricated.
From the available information, this looks like a technical report by a, probably now defunct, company back in the 80s. If this was its only form of publication, and not on some conference proceedings for example, it would be only found available on select university libraries as a physical copy. But most important,
2. This isn't even as an impactful paper as the parent comment states. Or if its proposed concept is, the original idea is probably derived from some other paper that is indeed the one that is highly cited and most definitely available online.
Accumulative citations number from Google Scholar and IEEEXplore doesn't exceed fifteen for the particular paper though.
https://scholar.google.com/scholar?cites=1491448744595502026...
https://shkspr.mobi/blog/2021/06/where-is-the-original-overv...
I can only find the Oracle reference to Sharding, which might be the same thing or not. https://docs.oracle.com/en/database/oracle/oracle-database/1...
Along with the wikipedia reference. https://en.wikipedia.org/wiki/Shard_(database_architecture)
And a Science Direct reference. https://www.sciencedirect.com/topics/computer-science/shardi...
Along with facebooks reference. https://engineering.fb.com/2020/08/24/production-engineering...
And Wolverhamptons reference to Oracle Sharding. http://ora-srv.wlv.ac.uk/oracle19c_doc/shard/sharding-overvi...
And Amazon's. https://aws.amazon.com/what-is/database-sharding/
So is the original paper a myth or was/is this demonstrating the closed circuit nature of the dissemination of knowledge?
How many different ways do you cut up the data?
I was in a similar situation before with some math paper from the 50s that's nowhere to be found (neither online nor in library indices) and you'd be surprised how many professors still use paper copies.
Very interesting either way!
My understanding of this work: A forward pass for a (fully-connected) layer of a neural network is just a dot product of the layer input with the layer weights, followed by some activation function. Both the input and the weights are vectors of the same, fixed size.
Let's imagine that the discrete values that form these vectors happen to be samples of two different continuous univariate functions. Then we can view the dot product as an approximation to the value of integrating the multiplication of the two continuous functions.
Now instead of storing the weights of our network, we store some values from which we can reconstruct a continuous function, and then sample it where we want (in this case some trainable interpolation nodes, which are convoluted with a cubic kernel). This gives us the option to sample different-sized networks, but they are all performing (an approximation to) the same operation. After training with samples at different resolutions, you can freely pick your network size at inference time.
You can also take pretrained networks, reorder the weights to make the functions as smooth as possible, and then compress the network, by downsampling. In their experiments, the networks lose much less accuracy when being downsampled, compared to common pruning approaches.
Paper: https://openaccess.thecvf.com/content/CVPR2023/papers/Solods...
Unfortunately I'm not deep enough into the topic to understand what their contribution to the theory part of it is. (they have some Supplementary Material in [INN Supp]). In the discussion of the Integral Neural Networks (INN) paper, there's this paragraph about an operator learning publication:
"In [24] the authors proposed deep neural networks with layers defined as functional operators. Such networks are designed for learning PDE solution operators, and its layers are continuously parameterized by MLPs only along the kernel dimensions. A re-discretization was investigated in terms of training on smaller data resolution and testing on higher input resolution. However, the proposed framework in [24] does not include continuous connections between filters and channels dimensions."
Also the weight permutation to perform the resampling on pretrained networks in INNs seems to be novel? And I guess it doesn't hurt that they're bringing new eyeballs to the topic, by providing examples of common networks and a PyTorch implementation.
[INN Supp]: https://openaccess.thecvf.com/content/CVPR2023/supplemental/...
[24]: Zongyi Li Nikola Kovachki. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020, https://arxiv.org/abs/2003.03485
* In fact, INNs concept opens possibility to utilise differential analysis for DNNs parameters. Concept of sampling and integration can be combined with Nyquist theorem (https://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_sampli...). Analysing the FFT image of weights allows to create the measure of a layer capacity. Two different size DNNs can be equivalent after conversion to INN because max frequency is the same for both networks.
* Tuning the integration grid is actually first steps for fast knowledge extraction. We have tested INNs on discrete EDSR (super-resolution) and have prune without INN training in 1 minute. We can imagine situation when user fine-tunes GPT-4 for custom task just by integration grid tuning simultaneously reducing number of model parameters keeping only important slices along filters/rows/heads etc. Because of smooth parameters sharing new filters/rows/heads include "knowledge" of neighbours.
* Also interesting application is to utilise integral layers for fast frame interpolation. As conv2d in INN can produce any number of output channels i.e. frames.
You can stay tuned and also check Medium on INN progress and applications. New Medium article already available: https://medium.com/@TheStage_ai/unlocking-2x-acceleration-fo...
Now we just need an iterative solver over both the structure and the "weights", and we get both architecture search and training at the same time
https://krebsonsecurity.com/2023/05/re-victimization-from-po...
Researchers bought up a bunch of seized phones from police auction sites and found about 25% of them were trivially unlockable and still held sensitive data about suspects and victims.
Recent:
"How to Hack the Simulation?" Roman Yampolskiy https://www.researchgate.net/publication/364811408_How_to_Ha...
"On the Computational Practicality of Private Information Retrieval" Radu Sion, Bogdan Carbunar https://zxr.io/research/sion2007pir.pdf
(via "Explained from scratch: private information retrieval using homomorphic encryption," https://blintzbase.com/posts/pir-and-fhe-from-scratch/ )
Mr. Bayes and Mr. Price. “An Essay towards Solving a Problem in the Doctrine of Chances. By the Late Rev. Mr. Bayes, F. R. S. Communicated by Mr. Price, in a Letter to John Canton, A. M. F. R. S.” In: Philosophical Transactions of the Royal Society of London 53.0 (1763), pp. 370–418. DOI: 10.1098/rstl.1763.0053. URL: https://doi.org/10.1098/rstl.1763.0053
Not only is this publication amazing because it is the source of conditional probability which is the basis for Bayes's Theorem, but a dated original of the document is available. The thing I find really beautiful is that it was submitted 2 years after the authors death, by his friend who found the material when going through his things. We will never know how much was changed prior to submission, but I find it a really beautiful tribute to his friend and the co-author makes it clear that the attribution should go to his friend, and that letter still exists today (and it is a lovely read). I can't think of anything similar which has been maintained 260 years later, and still a somewhat useful academic publication. I felt privileged to cite this reference in my MSc thesis.https://www.science.org/doi/10.1126/sciadv.adg8993
The authors show that a biological type laboratory ultracentrifuge can efficiently function as a near-universal isotope separator. Any element that can be dissolved as a salt in water -- the entire periodic table, excepting the noble gases -- can be enriched according to its relative mass. This can reduce the cost of refining certain isotopes like calcium-48 by orders of magnitude compared to the previous best techniques.
Left unsaid, but implied by its universality: the new technique is also a new approach to producing enriched fissile materials for nuclear reactors and weapons. It requires less chemical engineering sophistication than current processes which require production and handling of gaseous uranium hexafluoride.
The other even more exotic possibility is to use something like this to enrich ordinary reactor grade plutonium to weapons grade plutonium-239. The amount of mass to process with plutonium is orders of magnitude less because spent fuel plutonium is already more than 50% Pu-239, versus 0.7% U-235 in natural uranium, and a bare sphere critical mass of Pu-239 is only 10 kg vs 52 kg for U-235:
https://en.wikipedia.org/wiki/Critical_mass#Critical_mass_of...
The United States considered enriching waste fuel plutonium to weapons grade in the 1980s, when it contemplated another big nuclear weapons buildup against the USSR, but the laser based separation technology to be used was much more complicated than centrifuge separation. The project ended shortly after the USSR dissolved. It was called the Special Isotope Separation Project.
The 1988 environmental impact statement for the project gives some background information:
https://www.energy.gov/sites/prod/files/2015/06/f24/EIS-0136...
https://www.sciencedirect.com/science/article/abs/pii/S13858...
Certain bacteria can directly assimilate a mixture of hydrogen, carbon dioxide, and nitrogen to produce protein. You could consider it an alternative to bacterial nitrogen fixation in root nodules with much higher productivity. Or you could consider it an alternative to the Haber-Bosch process with much milder reaction conditions -- ambient temperature and pressure. It's a way to turn intermittent electricity into protein with simple, robust equipment. I wouldn't be surprised if this or a related development ultimately supplants much of the current demand for synthetic nitrogen fertilizers.
Bacterial protein may trigger allergic reactions in people and bacterial biomass is purine-rich which can also be a problem for people prone to gout. It's possible that cell engineering, directed evolutionary selection, or additional post-growth processing can minimize these problems.
I personally think that the more likely path is using fast-growing bacteria as feed for animal agriculture or aquaculture. Solar panels are so efficient at sunlight conversion compared to plants that you could farm salmon protein starting from bacterial pellets grown on solar derived hydrogen with per-hectare productivity comparable to conventionally farming soy beans. But the solar farm can go on saline, dry, contaminated, or otherwise agriculturally useless land. And salmon has slightly greater nutritional value than soy protein plus significantly greater market value.
We've been using the same API to communicate with our NICs since 1994. That API severely limits network throughput and latency. By simply changing the API (no new NIC) you can get 6x higher throughput in some apps and 43% lower latency.
Code runs on FPGA NIC only for now: https://github.com/crossroadsfpga/enso
Won USENIX OSDI best paper award and best artifact award.
https://dl.acm.org/doi/10.1145/3591274
Deep learning took off precisely when the ImageNet paper dropped around 2010. Before nobody believed that backprop can be GPU-accelerated.
When I was doing my master's in 2004-06, I talked to a guy whose MSc thesis was about running NNs with GPUs. My thought was: you're going to spend a TON of time fiddling with hacky systems code like CUDA, to get basically a minor 2x or 4x improvement in training time, for a type of ML algorithm that wasn't even that useful: in that era the SVM was generally considered to be superior to NNs.
So it wasn't that people thought it couldn't be done, it's that nobody saw why this would be worthwhile. Nobody was going around saying, "IF ONLY we could spend 20x more compute training our NNs, then they would be amazingly powerful".
It's easy to see in retrospect, but hard in prospect: the original paper [1] on GPU acceleration of NNs reports a measly 20x speedup. Assuming a bit of cherry-picking on the author's side to make get the paper published, the 'real-world speedup' will have been assumed by the readership to be less. But this triggered a virtuos cycle of continuous improvements at all levels that has been dubbed "winning the hardware lottery" [2].
[1] K.-S. Oh, K. Jung, GPU implementation of neural networks.
[2] S. Hooker, The Hardware Lottery. https://arxiv.org/abs/2009.06489
Hinton also addressed the contribution of hardware performance advances to practical deep neural net applications in his talks in the mid-2000s.
By contrast, there are no known polynomial time algorithms for program synthesis and the standard approach is to search some large combinatorial space [1]. That's the case for all the classical approaches: SMT, SAT, planning and scheduling, etc. At the same time there's very powerful heuristics for all the other classical problems that can solve many problem instances efficiently.
____________
[1] The one exception to this is Inductive Logic Programming, i.e. the inductive synthesis of logic programs, for which we do know a polynomial time algorithm (but that is my work so I'm not pimping it here).
How do you reconcile the NP-completeness result in [1] about training neural networks with your claim?
[1] A. L. Blum, R. L. Rivest, Training a 3-Node Neural Network is NP-Complete. https://proceedings.neurips.cc/paper/1988/file/3def184ad8f47...
What do you mean by this? Virtually all "classic" or "shallow" ML can be GPU-accelerated, from linear regression to SVM to GBM.
Modern GPUs are GP-GPUs: where GP means "general purpose": you can run any code on GPGPUs. But if you want to gain real speed-ups you will have to program in an awkward style ("data parallel"). I am not aware of GPU acceleration of the work-horses of symbolic AI, such as Prolog, or SMT solving. There has been a lot of work on running SAT-solvers on GPUs, but I don't think this has really succeeded so far.
> Deep learning took off precisely when the ImageNet paper dropped around 2010. Before nobody believed that backprop can be GPU-accelerated.
Deep learning kicked off with RBMs because you didn't have to do backprop and there was a training algorithm called "contrastive divergence". Each layer could be done in turn, which meant you could stack them way deeper. In ~2008-2009 I implemented Hintons paper on GPUs, which meant I could do the same scale of thing that was taking weeks in matlab in hours on a gpu, and then there were lots of gpus available on the cluster in the uni. Lots of fun cuda hacking (I'm just glad cublas was around by then). The original published learning rates/etc are wrong if I remember right, they didn't match the code.
- Specification: what are you are looking for?
- Search space: were are you looking?
- Search mechanism: how are you going through the search space?
Program synthesis is simply learning where the search space is syntax. In deep learning, taking the ImageNet paper as an example, the specification a bunch of photos with annotations, the search space is multi-variate real functions (encoded as matrix of floats) and the search mechanisms is gradient descent (implemented as backprop) with a loss function.
I think this paper uses regular expressions an example of how to search fast over syntax. It claims not to be tied to regular expressions.
The paper, in a modern context and based solely on the abstract and having been in the community, is chipping at the "uninteresting" part of the problem. Around that time, program synthesis started switching to SMT (satisfiability modulo theory) methods, meaning basically a super powerful & general SAT solver for the broad search ("write a wild python program") and then, for specialized subdomains, have a good way to call out to optimized domain solvers ("write a tiny bit of floating point math here"). The paper would solve what the regex callout looks like.. which is specialized. We can argue regex is one of the most minimal viable steps towards moving to general programming on GPUs. Except as a person who does SIMD & GPU computing, optimizing compute over finite automata is not general nor representative and I don't expect to change my thinking much about the broader computational classes. To be fair to the authors... back then, synthesizing regex & sql were hard in practice even for boring cases.
Separately, nowadays synthesis has shifted to neural (copilot, gpt), and more interesting to me, neurosymbolic in R&D land. We're doing a lot of (simple) neurosymbolic in louie.ai, and I'm excited if/when we can get the SMT solver side in. Making GPT call Z3 & Coq were some of the first programs I tried with it :) Till then, there's a lot more interesting low-hanging fruit from the AI/ML side vs solvers, but feels like just a matter of time.
Here's some recent papers I liked:
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8413749/ -- Lithium is used as a mood stabilizer for bipolar and in other disorders. The form of Lithium used in psychiatry is Lithium Carbonate. But other forms also exist. As a supplement: there is Lithium Orotate which some people use to help them sleep, deal with stress, and so on. This paper puts forwards the idea that Lithium Orotate is preferable to Lithium Carbonate due to lower quantities being needed for the same therapeutic results. Resulting in less side-effects.
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525098/ -- In bipolar disorder its known that there are abnormalities in the presence of brain derived neurotrophic growth factor (BDNF.) What's interesting about this is treatments for bipolar help to increase BDNF which may be something of interest to those who are into nootropics.
The papers on this site are honestly some of the best written, in-depth, and accessible works I've come across anywhere. There's enough information here to live a better life if you're willing to sift through papers. No joke.
Best content + quality LLM + quality verifications => Useful health advice.
Not only did we get a whole new type of Chess engine, it was also interesting to see how the engine thought of different openings at various stages in its training. For instance, the Caro-Kann, which is my weapon of choice, was favored quite heavily by it for several hours and then seemingly rejected (perhaps it even refuted it?!) near the end.
The super cool thing about MuZero is that it learns the dynamics of the problem, i.e. you don't have to give it the rules of the game, which makes the algorithm very general. For example, DeepMind threw MuZero at video compression and found that it can reduce video sizes by 6.28% (massive for something like YouTube)[2][3].
Curious if anyone else knows examples of MuZero being deployed outside of toy examples?
[1] https://arxiv.org/pdf/1911.08265.pdf [2] https://arxiv.org/pdf/2202.06626.pdf [3] https://www.deepmind.com/blog/muzeros-first-step-from-resear...
(edit s/Google/DeepMind)
To be fair, it uses MCTS, which requires many simulations of the game. For this, it needs to know which moves are valid, and when a player wins or loses the game.
So it does need to know the rules of the game, but it doesn't need any prior knowledge about which moves are better than others.
This one is interesting, "On the Influence of Gravitation on the Propagation of Light" you can read here: https://einsteinpapers.press.princeton.edu/vol3-trans/393. This was his initial stab at GR. It was a simple approach, later abandoned, that considered light to slow down in a gravity well rather than remaining constant and specifying that mass warps spacetime. I suppose I like the idea that he pursued an idea that didn't work, and was forced to do something far more complex. It's kinda relatable.
https://www.scientificamerican.com/article/placebo-effect-gr...
The most interesting thing is that "placebo responses are rising only in the United States."
In particular last week i read their FBiP2 paper https://www.microsoft.com/en-us/research/uploads/prod/2023/0...
I am slooooowwly working on my own language that is planned to use work from both you and the Effekt team, so i may send you all an email in the future when i start having questions.
Open AI’s como paper, A Holistic Approach to Undesired Content Detection in the Real World.
https://arxiv.org/pdf/2208.03274.pdf
Lots of interesting facts are strewn around the paper.
——-
The first paper that squarely talked about the language resource gap in CS/ML. Before this came out, it was hard to explain just how stark the gap between English and other languages was.
Lost in Translation: Large Language Models in Non-English Content Analysis
https://cdt.org/insights/lost-in-translation-large-language-...
——
This paper gets in for the title:
“I run the world’s largest historical outreach project and it’s on a cesspool of a website.” Moderating a public scholarship site on Reddit: A case study of r/AskHistorians
https://drum.lib.umd.edu/bitstream/handle/1903/25576/CSCW_Pa...
——
This was the first paper I ended up saving on online misinformation. The early attempts to find solutions.
The Spreading of Misinformation online, https://www.pnas.org/doi/10.1073/pnas.1517441113
What I liked here was the illustration of how messages cascade differently based on the networks the message is traveling through.
Also its impact https://www.nature.com/articles/s42254-022-00483-x
I mean we could, with infinite computing power and enough time to look into every interesting phenomenon (and to evaluate the corresponding multi-particle Schrödinger equation numerically) but there are simply too many such phenomena, Schrödinger equations are tough to solve, and quantum mechanics is also not a great level of abstraction for the reason you mentioned.
By McPartland and Small.
Moving on from cannabis sativa indica and cannabis sativa sativa to cannabis sativa indica Himalayansis and cannabis sativa indica asperrima depending on distribution from the original location of the extinct ancient cannabis wildtype.
Following this new classification, I believe there’s a third undocumented variety in North East Asia.
If anyone else has noticed the samesameification of cannabis strains and is wondering what the path forward is, this may be illuminating.
Half&Half: Demystifying Intel’s Directional Branch Predictors for Fast, Secure Partitioned Execution
https://matthias-research.github.io/pages/publications/PBDBo...
A new way of doing real time physics that dramatically outperforms state of the art by simply introducing a new algorithm. No crazy AI or incremental improvements to existing approaches.
> we present theoretical justification for the claim that the optimal form of some [combinational] circuits requires cyclic topologies. We exhibit families of cyclic circuits that are optimal in the number of gates, and we prove lower bounds on the size of equivalent acyclic circuits.
http://www.mriedel.ece.umn.edu/wiki/images/7/7a/Riedel_Cycli...
Gene linked to long COVID found in analysis of thousands of patients https://www.nature.com/articles/d41586-023-02269-2
Surfactants safely take down mosquitoes without using insecticides https://newatlas.com/science/surfactants-safely-take-down-mo...
This is what our Milky Way galaxy looks like when viewed with neutrinos https://arstechnica.com/science/2023/06/ghost-particles-have...
Curious - what would happen to a spider or an ant that ate a couple of mosquitoes with that spice on top? Will they also suffocate?
https://arxiv.org/abs/2306.09299
TokenFlow: Consistent Diffusion Features for Consistent Video Editing
https://huggingface.co/papers/2307.10373
Need to see code for second one.
https://arxiv.org/pdf/1901.05086.pdf
Analyzes a claim made in the 1950s by a prominent statistician: the frequency of interstate wars follows a simple Poisson arrival process and their severity follows a simple power-law distribution.
He was right, but it’s not clear why.
The paper is interesting because it shows how a new bit of knowledge creates a large number of known unknowns from previously unknown unknowns.
It also shows statistical magic was very much possible prior to computers.
Older one, but still very nice work.
By among others, the great Mike Stonebraker.
Also known as the Viterbi algorithm. Every digital communication device in existence today most likely has an implementation of it.
Later proved optimal by Forney [1]
0. https://www.essrl.wustl.edu/~jao/itrg/viterbi.pdf
1. https://www2.isye.gatech.edu/~yxie77/ece587/viterbi_algorith...
On why Oppenheimer didn't get the Nobel for his black hole paper decades before some else got it for a rediscovery.
"0/77 were randomized studies."
https://www.medrxiv.org/content/10.1101/2023.07.07.23292338v...
Here is a pdf.
https://www.medrxiv.org/content/10.1101/2023.07.07.23292338v...
Probably the biggest thing that blows my mind is the suppression of the Minnesota Coronary Study in the early 1960's. Literally half a century of dis/misinformation from the govt, pharma and medical industries that was disproven long ago. Nothing higher quality or more definitive since.
Basically, the whole, limit cholesterol and saturated fat intake in favor of more grains and seed oils is based on a theory that was long disproven. And, it's still pushed to this day. Why, there's big money/business in pharma and agriculture (corn, soy, wheat).