Particle physicists turn to AI to cope with CERN’s collision deluge

danbruc · on May 5, 2018

A simplified model of the task. You throw 10,000 springs of various sizes (particle trajectories) into a box and record the intersection points (hits) with a set of (spring-penetrable) nested cylinders (detectors) in that box. You will get on average about 10 intersection points per spring. Now given those 100,000 intersection points, reconstruct the size, position, and orientation of the 10,000 springs. Lets add the facts that all the springs are aligned along one axis and that the cylinders are nested around that axis to be geometrically closer to reality.

The hardness is due to possible ambiguities of what collection of springs caused the observed intersection points, imperfections in the helical shape of the springs and the cylindrical shape of the detectors, limited resolution of the measured intersection points, and missed (detector efficiency »only« 99 %) or added (detector noise) intersection points.

Coding_Cat · on May 5, 2018

Plus, the detector is multi-layered around the point of collision. Each layer measuring something slightly differently with different accuracies. Some springs are even completely invisible and you can only find them by looking at what is missing (neutrally charged particles).

konschubert · on May 5, 2018

What's the news here? Using Machine Learning in data analysis is common practice in particle physics.

Is the news that they want to use ML in the trigger selection now?

Also, which one of the four experiments is doing this?

EDIT: Ah, it's CMS.

dukwon · on May 5, 2018

> Is the news that they want to use ML in the trigger selection now?

Not quite (and that wouldn't be news). It's using ML for track reconstruction. Not even LHCb does this.

tempay · on May 5, 2018

LHCb does use ML for trigger selections already[1].

[1] https://cds.cern.ch/record/2243560?ln=en

dukwon · on May 5, 2018

Yep. I meant we don't use it for track reconstruction.

ibeckermayer · on May 6, 2018

I was working in UBC's ATLAS lab last summer, and this appeared to be the main focus of nearly all the PhD student's research. Using ML models for track reconstruction.

konschubert · on May 5, 2018

Ah, cool! That's really quite interesting!

diehunde · on May 5, 2018

I think it's that they posted part of their data on Kaggle to see if someone can come up with a better model

konschubert · on May 5, 2018

The problem with AI in particle physics is that you need to very well understand the efficiency of the selection ( rate of false negative and false positive classification). And this rate of course isn't uniform for all kinds of events. Thus, the uncertainty on the selection efficiency tends to grow with the complexity of the machine learning model. This directly hurts your sensitivy, the very thing you were trying to improve by using an AI - based process.

It's a trade-off.

Analog24 · on May 5, 2018

As already pointed out, detector efficiency is determined through extremely detailed simulations of the entire system. Those measurements are done in a completely orthogonal manner so the complexity of the model has nothing to do with the systematic uncertainties that arise from the selection efficiency.

konschubert · on May 5, 2018

> As already pointed out, detector efficiency is determined through extremely detailed simulations of the entire system.

It depends on the experiment, LHCb for example does not use simulated background.

In any case, the more complex your model gets (number of variables) the exponentially more simulated Monte Carlo events you need to fill that multidimensional space.

dukwon · on May 5, 2018

> LHCb for example does not use simulated background.

That depends on the analysis. If you're looking at a partially-reconstructed decay (common in semileptonics) then you can't rely on the regular trick of choosing a sideband sample to work as your combinatorial background. Also, it's very common to model specific misidentified or partially-reconstructed backgrounds using simulation.

> In any case, the more complex your model gets (number of variables) the exponentially more simulated Monte Carlo events you need to fill that multidimensional space.

I understand this argument if you're trying to model the efficiency in nD space (through splines, histograms, moments etc), but that's usually done when you're fitting to n variables, e.g. in an amplitude fit. If you just want the efficiency of a cut on the score from an MVA algorithm, I don't think it matters. What definitely matters is that the behaviour on simulation reproduces that of real signal as faithfully as possible.

konschubert · on May 5, 2018

> I understand this argument if you're trying to model the efficiency in nD space (through splines, histograms, moments etc), but that's usually done when you're fitting to n variables, e.g. in an amplitude fit. If you just want the efficiency of a cut on the score from an MVA algorithm, I don't think it matters. What definitely matters is that the behaviour on simulation reproduces that of real signal as faithfully as possible.

You might have a point there.

Analog24 · on May 5, 2018

I'm not talking about simulating background. Every experiment uses simulation (an extremely accurate and well understood package called GEANT) to simulate the detector response to measure efficiencies.

You're absolutely right about how the amount of data needed for statistical significance grows with the size of the parameter space you're searching.

jessriedel · on May 5, 2018

From what I remember doing this years ago, the selection efficiency is usually determined through numerical simulation rather than analytically. So why would your estimate of the efficiency get worse with more complicated models?

dukwon · on May 5, 2018

In general, simulation doesn't agree with data perfectly. The more the variables (and their correlations) aren't reproduced faithfully, the more the efficiency determined from simulation is likely to depart from the truth. It's even possible to accidentally train an algortihm to discriminate between simulation and data rather than signal and background.

It's common practice to check data/simulation agreements between variables before using them for training. Reweighting procedures are used to improve agreement.

DrNuke · on May 5, 2018

Hard hard problem, good luck to competitors. The new Kaggle-Google is a cheapskate trap but still fun a couple hours a week reading the forums and the most brilliant kernels.

mkagenius · on May 5, 2018

Its very economical for companies to have 100s of AI programmers work on this (for a few of months) for $25k.

But surprisingly the value they get out of it is certainly only 1 team's work.

jononor · on May 5, 2018

This competition (and the followup) is official for important conferences in the field, so I expect many teams to publish their results. In general there is lots of discussion, tips & tricks posted online during such competitons, both in Kaggle forums and in blogs. So the value is certainly more than that off the winner, and most of it is available to everyone, not just the sponsors.

itissid · on May 5, 2018

Can't they ask the top K teams for their solutions. Offer some smaller prize to them. I believe Kaggle supports that. Even if you have 10 top teams of 5 people each and gave them a sum of 300K total its damn cheap than hiring 4 or 5 full time engineers working for < 1 year.

jononor · on May 5, 2018

Usually in Kaggle they ask every team elegible for a prize for their solution. I believe it is just as much to ensure there was no cheating/hacking, than to actually make use of the solution itself.

dean177 · on May 5, 2018

“The top three performers of this phase hosted by Google-owned company Kaggle, will receive cash prizes of US$12,000, $8,000 and $5,000.”

“incomparably more difficult”

lima · on May 5, 2018

It's a competition, people aren't doing it for the prizes.

mkagenius · on May 5, 2018

Yes, but many certainly are doing it for the prizes.

rbanffy · on May 5, 2018

Some do it for the challenge - a lot of academic papers will be published about their efforts.

I'm not sure I like this approach, however. AI is not pixie dust you can sprinkle over your hard problems and even if you are 100% sure your AI-based system matches perfectly your current systems, you simply can't guarantee it will match the cases you never tested (the never-seen-before data from never-done-before experiments). You'll still need to do the well understood process once you flagged the interesting data (and possibly threw away the sets mislabeled as uninteresting).

A factor of 10, which is what's mentioned in the article, is what you expect to get with Moore's law in three or four years. With current off-the-shelf advances, I'd expect more than a 10-fold improvement in the next three years at the leading edge HPC world. Maybe CERN's problem is one that specialised compute units could solve better.

danbruc · on May 5, 2018

I am not sure, that sounds like a bad idea. Your chances of being one of the three wining teams are not terribly high and even if you manage to win, $12k for three months of work is not that much even if you work alone. Assuming there will be someone submitting a solution obtained using state of the art algorithms, it would probably be pretty naive to assume you have any significant chance of winning by just spending a couple of hours and throwing some generic machine learning algorithms at the problem.

It immediately looked like a really interesting challenge to me, but after reading a bit about the state of the art it seems like three months is a pretty short time to come up with a meaningful result even if you could work on it fulltime. Many people already invested a lot of time in that problem and existing solutions are quite sophisticated and good. The material actually mentions that they expect that you will have to take into account things like adjacent detectors overlapping by a few pixels or how particles may light up several pixels if they hit the detector at very shallow angles and cross several pixels as they pass through the detector.

The first thing someone probably considers is something like a Hough transformation and it turns out the creators of the challenge mention that in the material and submitted a solution based on this as a bench mark which achieves a score of about 20 %. If I read the related documents correctly, a meaningful result will require a score of at least about 90 % and the state of the art would probably be somewhere around 95 % to 98 %. The current leader is at 26.48 %, admittedly the challenge is only 5 days old. I am really curious where the scores will be at the end.

loceng · on May 5, 2018

Well, the scenario I can imagine someone doing these kinds of competitions for prize money is say the super smart to uber rare Turing level thinkers; it wouldn't really be for the prize money because they'd likely enjoy doing it, though that might be enough incentive to temporarily redirect their attention. It's nice thank you prize money anyhow.

goldenkey · on May 5, 2018

The US dollar is a very sought after commodity in 3rd world countries. 12k might be a crapshoot to you but 10 years of wages for a smart kid in another country

danbruc · on May 5, 2018

If you are able to win this competition, then why wouldn't you get a job in that area and get a comparable amount of money every month instead of just once?

rbanffy · on May 5, 2018

Not nearly desirable enough to motivate someone to invest the time and resources (which I assume would be non-trivial). 12k for 3 months of work amounts to 4k per month, which is a low salary for someone with the kind of skills this requires.

beojan · on May 5, 2018

Even $5k for three months is comparable to the low end of the range for PhD stipends, and $12k is well above the high end.

diehunde · on May 5, 2018

no they don't

amenod · on May 5, 2018

Someone posted what looks like a very nice summary of domain knowledge:

https://www.kaggle.com/pranav84/beginner-s-guide-to-cern-s-p...

jononor · on May 5, 2018

Here is a very to-the-point summary of particle tracking: http://www.physics.iitm.ac.in/~sercehep2013/track2_Gagan_Moh... - link found in Kaggle forums

mattheww · on May 5, 2018

The main thing of interest here is whether a system that knows about physics (traditional approaches) can be beaten by a system that has no a-priori knowledge of physics (OOB DL) or if someone will find a way to integrate physics knowledge into a DL approach.

jononor · on May 5, 2018

Curious to see what kind of techniques end up doing best on this. Worked on using GPUs in this areas back in 2010 for the ALICE project. Which was pretty straight port of an existing algorithm for finding certain phenomena given tracks. I believe the track reconstruction then used a multistage process ending with a Kalman filter.

jaddood · on May 5, 2018

I am sure the people at CERN have reasons in choosing machine learning for that purpose, but I don't see it ideal for this case. Machine learning might be able to give you your results faster, but generally with a lot lower accuracy. But suppose the teams got to a really good accuracy in their programs, it's still for the known and expected. In particle physics research — or any research for that matter — you cannot simply give it a shot, because your system might miss very interesting collisions while you think it's working perfectly fine. I think quantum computing can be really quite useful here given that the analysis can benefit from parallel processing. Should ML be necessary, I recommend develop it in a way that its doings are understandable. (I'm not sure they'll even read the comment to get the recommendation, but let's give it a try.) Explainable AI is becoming really significant these days, and some big companies and organisations are working on it. If I'm not mistaken, the DARPA is working on a project as such, so it's really not too far off.

nonbel · on May 5, 2018

>"In the new problem, she says, you have to find in the 100,000 points something like 10,000 arcs of ellipse."

Does anyone know how long this currently takes?

danbruc · on May 5, 2018

That certainly depends on the amount of computing power you can throw at it and what level of accuracy you are trying to achieve. But the most naive attempt using a Hough transformation will probably take a couple of milliseconds on your average computer. However, the accuracy will be far from good enough with such a simple approach.

I have no idea what amount of time state of the art algorithms will use, but I could well imagine that it is essentially a tunable parameter that is chosen to get the best accuracy given the amount of data you have to process and the time available to complete the task.

mattheww · on May 5, 2018

Hough transform is basically useless for precision tracking in a high multiplicity environment. Tracks have 5 degrees of freedom, so the memory costs make it infeasible. I think ATLAS uses it in the trigger, where you don't need to actually reconstruct all tracks, just find out if there are a couple passing certain criteria.

danbruc · on May 5, 2018

You don't have to accumulate the transformation result into an actual five dimensional array, you can just transform the points, keep them in a list, a quad tree, or whatever you like, and then just run a clustering algorithm on the transformed points. Probably complicated by the fact that every point can vote for several parameters so that you are not actually clustering points but something like lines or planes associated with the points.

That seems also to be - but I did not look at the code - more or less what the creators of the challenge implemented and submitted as a benchmark implementation, admittedly with the expected poor performance score of only about 20 %.

mattheww · on May 5, 2018

If you know a way to find clusters of intersections of hyperplanes, I'm pretty sure you can get a highly acclaimed paper out of it, but that is not what a Hough transform is. The Hough transform is an approximate solution to that problem which works by sampling the hyperplanes and then polling them. There's no way to perform a Hough transform without having many more transformed points than input points, and the more precision you need, the more points you need.

danbruc · on May 5, 2018

I see, they actually did two different implementation, one DBSCAN based clustering approach, one based on a Hough transformation, and submitted the former one as a benchmark.

Not withstanding that, I am not yet convinced that a Hough transformation combined with something similar to a quad tree could not work. More specifically I am thinking of delaying the creation of votes. Roughly the first point just becomes a node in the tree corresponding to the bounding box of its entire possible parameter space. Only when we encounter a second point whose possible parameter space overlaps with that of the first one we split up the two volumes into one volume for which both points vote and a few volumes for which only one of the points vote.

This obviously requires that the possible parameter spaces do not have terrible shapes that are hard to bound and I also could see nearly perfectly overlapping volumes cause issues due to the generation of many small volumes for the imperfection in the overlap. There are probably more issues and possibly even show stoppers, but without picking up a pencil and really thinking about it, I not really tell whether or not it could work out. But, as said, I am also unable to see immediately why this could never work.

mattheww · on May 5, 2018

Depends exactly how you limit the problem, but going from detector hits to abstract arcs takes a few seconds.

JumpCrisscross · on May 5, 2018

Out of curiosity, what fraction of the Bitcoin network’s computing power would solve this computation wall?

21 · on May 5, 2018

Zero. All the Bitcoin network can do is compute SHA-256 hashes. It's literally useless for anything else.

zamalek · on May 5, 2018

I think they might mean the raw computational power (excluding ASICs).

SiempreViernes · on May 5, 2018

Dunno, how much memory and storage does the network have?

The CERN grid currently[1] has 1 EB of storage and 750k of CPUs, and that pushes out 2 million jobs per day. From own experience of one of the 162 sites, you have something like 6 GB of ram per CPU core, and often jobs need more than that so that in practice you are memory bound.

[1]: https://indico.cern.ch/event/466934/contributions/2524828/at...

jamesblonde · on May 5, 2018

They also have 0 GPUs. (I visited there a couple of weeks ago and talked to them, among other things, about this challenge).

tim333 · on May 5, 2018

Maybe they could issue Cerncoins, one issued for each particle traced. Then if they went up in value the miners could switch to that.

amelius · on May 5, 2018

Is there even a bound on the computational power we'd need to make an advancement here?

JumpCrisscross · on May 5, 2018

> Although these techniques would be able to work out the paths even after the upgrades, “the problem is, they are too slow”, says Cécile Germain, a computer scientist at the University of Paris South in Orsay. Without major investment in new detector technologies, LHC physicists estimate that the collision rates will exceed the current capabilities by at least a factor of 10.

Sounds like the LHC’s physicists have an idea.

jrq · on May 5, 2018

No, but CERN Is known to be careful with their PR. Presumably, and this is just my intuition speaking, a big enough cluster of computers would solve this, but they're taking an opportunity to experiment with different techniques and methods for this experiment, and that's pretty much it.

If CERN had an unlimited budget, I suspect they'd do it however they did it before.

Analog24 · on May 5, 2018

This is not a problem that can be solved just by throwing more compute resources at it. It's not as simple as simply having too much data to process, the real issue is that each detector has a time resolution that goes down to about a nanosecond. If you get one collision per nanosecond then it's pretty straight forward to associate every one of the (possibly) million detector hits to a single event and reconstruct it accordingly. The issues arises when you have more than one event (i.e. collision) within each nanosecond window. You end up with detector readings for each event overlapping each other without a simple way of disambiguating them. This is called "pile up".

When I last worked there in 2015 a typical pile up situation was having about 50 collisions per detector reading. It is no simple problem to simultaneously reconstruct 50 collisions from the same set of overlapping detector measurements.

dukwon · on May 5, 2018

From what I've heard, the amount of pile-up ATLAS and CMS can handle is limited by the CPU time it takes to reconstruct events, which can be alleviated by throwing more resources at it, but it is much better to develop quicker reconstruction algorithms.

Towards the end of last year, they had to start levelling the instantaneous luminosity to 75% of what they could achieve,† primarily to reduce the load on the grid.

† Edit: the maximum peak luminosity is still 200% of the design value, so the performance is beyond initial expectations.

mattheww · on May 5, 2018

To be fair, 50 PU is above design peak luminosity, much less mean. And I'm sure I've seen plots from both ATLAS and CMS at the end of LS1 that show improvements in the processing time at 100 PU by factors of roughly 10.

cfadvan · on May 5, 2018

Just to insert the usual analogy, if a single collision is the proverbial smashing of two clocks, and then trying to discern their makeup from the wreckage, this is akin to smashing dozens of clocks at the same time. How can you tell where one gear fragment was meant to fit when it’s such a mess?

whatshisface · on May 5, 2018

One thing that might help is that the collision vertex is slightly different each time. If you already had all of the tracks reconstructed, you'd find that you could point them together into n different centers of mass from which they originated.

Analog24 · on May 5, 2018

Vertex finding is the first step in most track reconstruction algorithms. But it's still very difficult. The number of trajectories that can be formed from discrete points blows up combinatorically. When you have hundreds of thousands of points you usually don't end up with a few easy to find vertices.

dukwon · on May 5, 2018

This is already what happens. The problem is that forming tracks and vertices becomes much harder as the number increases.