Three Major Physics Discoveries and Counting (opens in new tab)

(quantamagazine.org)

99 pointstokenadult7y ago16 comments

16 comments

A intriguing remark near the end: We are also very excited about the near future, because we plan to start using quantum computing to do our data analysis.

WilliamEdward7y ago

I don't know how she could say this with any authority. One of the biggest roadblocks in quantum computing is finding a faster way to make quantum logic gates. So far they're only making them one by one, when tens of thousands are needed. We will not be using this tech anytime soon.

ape47y ago

Using subattomic computers to learn about subattomic particles. That will lead to better subattomic computers which will lead to... (A virtuous cycle like Moore's Law?)

mathgenius7y ago

It was nice to see this comment, because to me it means that she is always looking to the future & optimistic.

One of the near-term applications of quantum computing is for machine learning. For example, the D-wave chip is for optimising a binary fitness function. We will soon see if it works out.

1 more reply

sempron647y ago

Interestingly, she mentions using machine learning for analysis of the Higgs Boson data. Does anyone here know more about this?

DuskStar7y ago

In my opinion, CERN is one of the few groups out there that can call their data set "big data" without talking out of their ass. (If it can fit in RAM, it ain't Big Data. Multiple TBs fit in RAM) Their detectors produce something on the order of a petabyte per second which is then pared back immensely to become something that's actually storable. Most of the machine learning I've heard of involving CERN is in reducing that data stream and then highlighting "interesting" things for researchers to take a look at.

frumiousirc7y ago

Although LHC experiments are toward the top of the heap in terms of data rates, many particle and astronomical experiments produce "big data". One quarter of the DUNE experiment will acquire about 50 EB/year (that's exabyte), outputting about 10 PB/year to tape. LSST will produce data in the few PB/year range.

lokimedes7y ago

Not to belittle your examples, but in a historical context the LHC held a specific position. Remember that the ATLAS/CMS/ALICE/LHCb experiments started recording data at 10 GB/s back in 2008. Now, ten years later it is only natural that large data rates are becoming the norm.

mmt7y ago

> If it can fit in RAM, it ain't Big Data. Multiple TBs fit in RAM

To be fair, the price for that RAM starts to steepen at (if not before) the 4TB mark, and 1.5TB might have been the limit on frugal main memory as recently as a year and a half ago.

OTOH, SSDs are very fast even at low cost, so I'd argue if it can fit in directly-attached storage whose aggregate bandwidth compares with the RAM's bandwidth, it ain't Big Data, either. Though even a single PB might be big enough, if the CPUs are too slow.

dguest7y ago

As others have commented we've been using tools like neural networks and boosted decisions trees for a long time. We have quite good simulation which tells us what particles would look like in our detector, but one thing our simulation tells us is that it's often really hard to tell the difference between a Higgs Boson and some other "background" process.

So the logic goes like this: if we trust our simulation, we can simulate the Higgs, and simulate the background, and then train a neural network to tell us which is which. Then we turn the network loose on our data. If it sees lots of things that look like Higgs, yay, we discovered something!

For machine learning tools, we had a few homegrown implementations that didn't get far beyond physics (probably because they weren't particularly user-friendly). But physicists would have referred to techniques like this as "Multivariate Analysis" (or "MVA") a few years ago.

More recently we've started to reach out more to industry and use their tools, which actually much nicer! What she's referring to here is one particular analysis her team contributed to [1], which relied on XGboost [2]. Beyond that we've used Keras a fair bit to identify some types of particles.

[1]: https://arxiv.org/abs/1806.00425 [2]: https://xgboost.readthedocs.io/en/latest/

my_first_acct7y ago

Not exactly an answer to your question, but there is a Kaggle competition currently in progress in this area, sponsored by CERN: https://www.kaggle.com/c/trackml-particle-identification.

lokimedes7y ago

We (Member of the ATLAS Experiment, 2008-2016) used Neural networks for trigger decisions (to record or ignore a collision) and Boosted Decision Trees were the big thing among many ATLAS physicists back around 2008-9, so that was also used quite a bit. For the experiments at the LHC you can consider the actual analysis of data a Multivariate Hypothesis Testing exercise. The thing is a counting experiment, you have a theory that provides a prediction, simulates its phenomenological effects, the reactions energy deposition in the detector(s), the electronic signal paths, and then we would run "reconstruction" basically turning electronic readouts into particle trajectories, particle energies and particle types. Under the laws of physics, these clues can be combined to measure the rest mass of initial particles that have long decayed (like the Higgs). Now given difficulty of separating the multiple particles leaving energy behind, it is quite difficult to separate the "background" from known physics from the interesting "signal" from a new theory under test. Rather than having to manually do the analysis, Machine Learning is applied at multiple stages. This can be to do particle identification (is it a muon, an electron or something else) or to maximise the binary seperation between two classes such as the signal/background.

In genereal it is wise to know that Machine Learning, Big Data and Cloud computing has been used in particle physics for decades, but with the LHC a world-wide infrastructure has been created to capture all the learnings that the mainstream are only beginning to discover now. For instance a main paradigm in the analysis model is to move calculation to the data, rather than the other way around, due to the large amounts of data. You may call it MapReduce, we call it physics analysis (Map you statistical analysis across decentralised data, reduce the output through distributed merge jobs, plot and publish). Sorry to sound like an old fart, you question is honest and relevant, but it really underlines how easy a story about how Google/Facebook/whatever invented something can rewrite history. Most of the stuff people in the IT/Tech sector are playing around with are inspired by basic or applied science, and applied in a commercial setting. This is exactly how it is supposed to work, but damned if the log analysing marketeers at Google should have all the credit for these developments :)

Now, with my rant over, here are a few references that may be interesting to you:

These were the tools used for physics analysis:

http://tmva.sourceforge.net https://root.cern.ch https://twiki.cern.ch/twiki/bin/view/RooStats/WebHome

And a few articles http://atlas.cern/search/node/Boosted%20Decision%20tree https://cds.cern.ch/search?ln=en&sc=1&p=Machine+Learning&act...

Oh and a bit of gossip. We called Sau Lan Wu the "Dragon lady" (mostly behind her back), because of her awesome energy and tenacity. She really deserves the credit given in the article!

konschubert7y ago

Sadly, the ML research in physics and in the rest of the world are extremely disconnected. To the point where one side has never heard of the tools that the other side uses all day.

I think the blame here is mostly on the science community which isn't paying much attention to ML tooling, best practices and research and instead keeps reinventing the wheel, over and over again.

guitarbill7y ago

> I think the blame here is mostly on the science community which isn't paying much attention to ML tooling, best practices and research

While the science community doesn't have a great track record for quality software engineering, that's an awfully arrogant position.

Most ML tooling sucked, and it's only just getting better in terms of usability. But even then it's very software engineer-y in the worst kind of way, e.g. "Coming soon: PyTorch 1.0 ready for research and production" [0]. Great.

If you're doing research in one field, you don't really want to spend the time to become an expert in another one just to do some analysis. What you want is tools you can reliably (ab)use, like maths. But there often isn't a straight-forward way of getting the uncertainties on values output from many ML constructs.

Yes, the term "statistical learning" has been around since at least 2001. But it isn't widely known/talked about/understood, and most trendy ML "tutorials" gloss over it completely. Maybe this is unfair criticism. After all, most ML applications in software don't require that stricter treatment, and why should somebody playing around with ML be burdened with this rigorousness? At the same time, it's easy to come away from ML thinking "I don't understand this at all, it's a black box, it doesn't do what I need it to".

And we haven't even talked about what a pain reproducibility is in ML.

> instead keeps reinventing the wheel, over and over again.

If people keep reinventing it, maybe the problem isn't the people... yeah, physicists don't write great code (guilty), but ML tooling is full of hype and currently feels a bit Javascript-y.

[0] https://pytorch.org/2018/05/02/road-to-1.0.html

lokimedes7y ago

That may have changed a bit now. I'm no longer part of the physics community, but it seems that the physicists at least are relying more and more on mainstream tools. The outward flow of knowledge comes more from ex-physicsts like myself, who work in industry. Most of us work in Tech, Fintech or other ML/Stats driven industries, where many reimplement what they have learned during their physics days.

dguest7y ago

She's not talking about TMVA or RooStats when she says "Machine Learning": those would be "MVA" and "Statistics tools" in our jargon. She's talking about XGBoost[1].

[1]: https://xgboost.readthedocs.io/en/latest/

j / k navigate · click thread line to collapse

16 comments

abecedarius7y ago

A intriguing remark near the end: We are also very excited about the near future, because we plan to start using quantum computing to do our data analysis.

WilliamEdward7y ago

ape47y ago

Using subattomic computers to learn about subattomic particles. That will lead to better subattomic computers which will lead to... (A virtuous cycle like Moore's Law?)

mathgenius7y ago

It was nice to see this comment, because to me it means that she is always looking to the future & optimistic.

One of the near-term applications of quantum computing is for machine learning. For example, the D-wave chip is for optimising a binary fitness function. We will soon see if it works out.

1 more reply

sempron647y ago

Interestingly, she mentions using machine learning for analysis of the Higgs Boson data. Does anyone here know more about this?

DuskStar7y ago

frumiousirc7y ago

lokimedes7y ago

mmt7y ago

> If it can fit in RAM, it ain't Big Data. Multiple TBs fit in RAM

To be fair, the price for that RAM starts to steepen at (if not before) the 4TB mark, and 1.5TB might have been the limit on frugal main memory as recently as a year and a half ago.

dguest7y ago

[1]: https://arxiv.org/abs/1806.00425 [2]: https://xgboost.readthedocs.io/en/latest/

my_first_acct7y ago

Not exactly an answer to your question, but there is a Kaggle competition currently in progress in this area, sponsored by CERN: https://www.kaggle.com/c/trackml-particle-identification.

lokimedes7y ago

Now, with my rant over, here are a few references that may be interesting to you:

These were the tools used for physics analysis:

http://tmva.sourceforge.net https://root.cern.ch https://twiki.cern.ch/twiki/bin/view/RooStats/WebHome

And a few articles http://atlas.cern/search/node/Boosted%20Decision%20tree https://cds.cern.ch/search?ln=en&sc=1&p=Machine+Learning&act...

Oh and a bit of gossip. We called Sau Lan Wu the "Dragon lady" (mostly behind her back), because of her awesome energy and tenacity. She really deserves the credit given in the article!

konschubert7y ago

Sadly, the ML research in physics and in the rest of the world are extremely disconnected. To the point where one side has never heard of the tools that the other side uses all day.

I think the blame here is mostly on the science community which isn't paying much attention to ML tooling, best practices and research and instead keeps reinventing the wheel, over and over again.

guitarbill7y ago

> I think the blame here is mostly on the science community which isn't paying much attention to ML tooling, best practices and research

While the science community doesn't have a great track record for quality software engineering, that's an awfully arrogant position.

And we haven't even talked about what a pain reproducibility is in ML.

> instead keeps reinventing the wheel, over and over again.

If people keep reinventing it, maybe the problem isn't the people... yeah, physicists don't write great code (guilty), but ML tooling is full of hype and currently feels a bit Javascript-y.

[0] https://pytorch.org/2018/05/02/road-to-1.0.html

lokimedes7y ago

dguest7y ago

She's not talking about TMVA or RooStats when she says "Machine Learning": those would be "MVA" and "Statistics tools" in our jargon. She's talking about XGBoost[1].

[1]: https://xgboost.readthedocs.io/en/latest/

j / k navigate · click thread line to collapse