D0 SingleT events:

Evidence for single top by D0 December 12, 2006, posted by dorigo at dorigo.wordpress.com/

Ok, on this one, D0 appears to beat us. 

I recently discussed the complex situation of single top production searches in CDF in http://dorigo.wordpress.com/2006/11/07/the-elusive-single-top/ ( and see http://dorigo.wordpress.com/2006/11/20/a-low-mass-top-in-single-top-events/ for an exotic interpretation of those results). To summarize here, despite a huge effort by CDF, no clear indication of the signal is present in the data so far analyzed - one analysis finds a 2.6-sigma excess over backgrounds, but another study based on the same data sees no signal at all; and the 2.something-sigma effect arose suspicion in some that there be something unexpected in the data.

Things in D0 are brighter: they gave a wine and cheese seminar at Fermilab four days ago, when Dugan O'Neil showed the results of three different analysis methods, all consistently showing a clear evidence for a Standard Model signal of single top production. You can find Dugan's slides at http://www-d0.fnal.gov/Run2Physics/WWW/results/prelim/TOP/T39/wine_and_cheese.pdf.

The a-priori best measurement of the set provides a cross section of 4.9+-1.4 picobarns, when 2.9+-0.3 pb is the next-to leading order theory prediction. This cannot be dubbed an "observation" yet (which is a word reserved for 5-sigma effects in physics jargon), but it comes close to it, and it would be very strange if the 3.6-sigma excess of D0 did not grow to observation level in the next few months, as more data will be fed to the analysis.

What's more, D0's excess appears to cluster at the right top mass value, and not at low mass as some of CDF's - an indication that things are going fine and that the standard model still rulez. Indeed, single top production is a purely electroweak process - at least in one of the two production channels - and surprises there would be twice as much puzzling. You can see some of the D0 signal in the plot ...

... where the reconstructed top quark mass is plot for the data and compared to backgrounds (in grey, green, and red) and signal (the blue stuff).

If you are curious about the details of the analyses by D0, I encourage you to have a look at the slides linked above. If you are lazy, I offer below a poor-man description of the whole thing…

Events triggered by the presence of a high-momentum electron or muon are collected in a 0.9 inverse-femtobarn dataset, significant missing transverse energy is required, and two or more jets. A neural-network B-tagger finds very effectively jets which are likely to have originated by b-quark hadronization, thus enriching the data with the single top production signal, which should nominally yield a W (yielding the lepton and missing Et) and two b-quark jets (plus an additional light-quark jet in some cases). The data is then studied by a decision tree, which uses many kinematical variables to discriminate the signal from all known background sources. A cut on the decision tree output enriches the surviving data of signal to the level that an excess is observed. The most discriminating kinematical variables (such as the one shown in the plot above) are then studied to verify that the excess sits where it is expected from single top production. From the excess a cross section is computed by taking into account the amount of data analyzed and the selection efficiencies.

Still curious ? Go to the talk!


Date: Wed, 13 Dec 2006 03:17:27 +0100 (CET)
From: Tommaso Dorigo 

(TD reply text in normal type)

To: Tony Smith
(TS original message text in preformatted type)

Subject: Re: D0 singleT

Hi Tony,

as usual, you're welcome, and as usual my answers have a fair chance of being only partial answers to your questions. However I will try to do my homework.

> It may be that my questions are too naive to be useful
> because I don't have much intuition about what DT means physically,
> so please feel free to tell me if that is the case,
> and in that case just ignore the questions asked below in this message.
> On the other hand, if you think that the questions might be useful,
> feel free to post this message including images on your blog entry.

I don't know D0's decision tree well enough myself, but I know the list of variables which are fed in the trees, from slide 24.

There, you can see that they put in the "best" top mass as a kinematical selection variable. Moreover, many other variables which are directly correlated with the top mass itself are fed into the DT. It is a perfectly legal thing to do, but once you do it, you have to be careful to interpret the results. In particular, a single top production process with a mass different from that with which you built your trees (your "signal") will be treated as a background if the mass difference is large enough to make the branches split regular top and low-mass top differently.

What would tell us if that is really the case would be the relative weight that the final trees give to each of the variables. If the top mass is one of the vars which is given most weight in determining how to classify the event, then any top signal with mass significantly different from 175 GeV would be washed out.

Be careful here, "weight" is not a very well defined quantity here. Some decision tree algorithms have a built-in way to determine a posteriori (i.e. when the trees are built) what weight did a variable have in selecting signal from backgrounds. Others don't. I would not be surprised if, by asking D0 what weight does the "best" top mass have in their DT, you got a perplexed look in return, or worse, a layman explaination that the DT is not a neural network. But they might also answer with a number straight away :)

In any event, I have the answer myself. If you look at the plots, they speak to you. The three distributions at DT<0.3,

intermediate,

and >0.65

are VERY different in the "best" top mass. AND, the high-DT data have a perfectly coincident distribution for all backgrounds and for single top. THat is to say, that variable has been totally "squeezed" for its discriminating power by the classifier. In other words, what one can gather from that plot is only the relative normalization of the expected contributions to the data points, since shapes will be coincident. A point of relevance: the relative normalization of the various colors

tells you indeed that the high-DT data favors the SM single top with respect to backgrounds, as it should. But it does so based on the top mass itself, and therefore that variable is no longer a very good one to display the final result! In fact, one would prefer to keep the most discriminant variable aside, and train one's classifier with the others, being careful to avoid variables that are correlated with the most powerful one: that way, one would retain discriminating power in the best variable _after_ a cut on the classifier's output. That is the strategy adopted for higgs searches at low mass in CDF, where the higgs mass is left aside, being very discriminant by itself.

 So, to summarize:

... 

> Attached image D0TqDTs.jpg shows Decision Tree output that 
> seems to me to be shown in more detail in the images from slide 47.
>
> Looking at the attached image D0TqDTlt3.jpg showing Tquark mass
> for DT less than 0.3
> it seems to me that the high data points for singleT events
> are in the bins for 100-125 GeV and 150-175 GeV.
> However,
> I guess that low DT might mean that not many singleT events
> are expected, because the low DT histogram shows very little of the blue
> or cyan colors that correspond to expectation of singleT events,
> so
> maybe the low DT data is not very significant ?

Not necessarily. Low DT means low probability of a 175 GeV top, given a lot of final state quadrimomenta. So a lower mass top quark might get a low grade and end up there. By the way, have you noticed the tell-tale dip at 175 of the W+jets background ? That is the sign that events with that mass are preferentially high-DT ones, if there are no more striking characteristics telling them apart from the Single top hypothesis - for instance, ttbar does not get such a void at 175 because there are more useful variables to discriminate it from single top, and it clusters at 175 anyway...

>
> Looking at the attached image D0TqDTgt5.jpg showing Tquark mass
> for DT greater than 0.55
> it seems to me that the high data points for singleT events
> are in the bins for 175-200 GeV and 225-250 GeV,
> and
> that the data point for the 150-175 GeV bin is a bit low.

All good - but we are discussing very insignificant flukes here. The error bars are generally larger than any discrepancy...

> Looking at the attached image D0TqDTgt6.jpg showing Tquark mass
> for DT greater than 0.65, which I think is the image on your blog entry,
> it seems to me that the high data points for singleT events
> are in the bins for 125-150 GeV and 175-200 GeV and 225-250 GeV,
> with the 175-200 GeV bin having the highest data point,
> and
> that the data points for the 150-175 GeV and 200-225 GeV bins are low.

Ok.

> Since the DT greater than 0.65 histogram shows the largest
> amount of blue and cyan singleT contributions, I guess that
> it might be considered the most physically significant histogram
> with respect to seeing singleT events.

Again, not so given what I wrote above.

> It is interesting to me that for DT greater than 0.65 the
> three bins with high data points correspond to the three
> peaks around 125-150 GeV and 175-200 GeV and 225-250 GeV
> that were present back in the early semileptonic histograms
> of the 1990s at CDF and D0.

Ok, but remember we are talking few events here. At 225-250 there is ONE with a background of 0.3...

>> From the Decision Tree output graph of attached image D0TqDTs.jpg
> it seems to me that:
> there are high data points for DT less than 0.3;
> there is a very high data point for DT between 0.45 and 0.5;
> and
> there is a high data point for DT between 0.55 and 0.6.

Careful here, they are plotting only single-tag =2 jet events here. THey have 36 subsets of data, which pass through as many different decision trees. One would need to examine all of them to make any conclusion, and by eye you could anyway draw only qualitative ones...

>
> I don't have a good intuitive understanding of the physical significance
> of the various values of DT,
> so
> just looking at the various bins for Tquark mass in slide 47,
> with most emphasis on DT greater than 0.65,
> it seems to me to that maybe D0 might be seeing a significant number
> of singleT events in low-mass regions such as 125-150 GeV,
> although most of the events seem to be in the 175-200 GeV region.
>
> Is there a good paper for me to read that would explain more
> about the physical significance of DT ?

I bet not... You need to go hat in hand to the D0 folks. Even if they do issue a paper on the analysis, the gory details will not be so clear as you would want them to be.

> Is there some physical reason that low DT sees events in 150-175 GeV,
> while the higher DT sees a deficiency of events in 150-175 GeV ?

Not necessarily physical - statistical probably. If systematic, then maybe it is connected to their way of training trees with so many variables correlated to each other. Usually, decision trees may get "overtrained" in such circumstances, and a way to avoid that is to do a random sampling of the variables used at each branching, and grow a huge number of trees rather than a single one, then asking trees to "vote" for a hypothesis. The random forests algorithm is such a delicious thing, I have a post about it which links to a informative site on that particular algorithm if you want more information. The post is from the beginning of June I think, so you could dig in my blog and find it. If you don't find it, and if you want it, let me know...

> Are there more detailed analyses of the D0 event data that are
> expressed in terms of Tquark mass ?

...

> Does the D0 analysis explain why the CDF data seemed to see
> singleT events at low Tquark mass,
> or
> is it the opinion of the D0 people that the CDF low Tquark mass data
> is an anomalous fluctuation that will go away with more CDF data ?

D0 data explains nothing... It is consistent with a SM single top, but not inconsistent with other satellite hypotheses IMO.

> Is it reasonable to expect that more data at both CDF and D0 will
> answer these questions ?

I think more data always helps, provided you are willing to let a good hypothesis go if the data disprove it. But it is good to be stubborn for a while longer, especially since nobody did really a search focused on low-mass single tops...

Cheers,

T.

> Tony
>
>

 


Date: Thu, 14 Dec 2006 15:33:46 +0100 (CET)

From: Tommaso Dorigo <tommaso.dorigo@pd.infn.it>

(TD reply text in normal type)

To: Tony Smith <f75m17h@mindspring.com>
(TS original message text in preformatted type)

Subject: Re: thanks for speculations blog entry 

Hi Tony,

I will try to answer some questions below.

...

> ... However,
> I do have another suggestion that might be unlikely to be accepted:
>
> It seems to me that if you already have a pretty good idea of what
> you are looking for, then things like Decision Trees, Neural Networks,
> etc., that are trainable would be useful in analyzing data with
> a large number of events,
> but
> the problem is (as you said in your blog entry) "... training ... with
> so many variables correlated to each other ...",
> which may not be very bad if you really do know what you are looking for.
> For example,
> a highly trained thing might be very good for looking at the 175 GeV
> Tquark peak (which is clearly known to exist) in great detail.
>
> However, if you are looking for something that you don't know,
> for example the Higgs mass, then you should be careful (again, as
> you said in your blog entry) to "... keep the most discriminant variable
> aside, and train one's classifier with the others, being careful to avoid
> variables that are correlated with the most powerful one ...".
>

I think there is already an attempt in CDF to look for the unknown without any preconception besides the existence of what is already proven to exist. These are "model-independent" searches. Not blessed yet, but they look for some thousands of permutations of interesting final states involving electrons, muons, missing Et, photons, jets, b-tags, and the like. All are fit with SM processes, and compared to expectations.

If on the other hand one looks for individual events, one can indeed find something weird-looking from time to time, but with billion-event size datasets this is a very dangerous thing to do from an experimental standpoint. We understand our data only from a statistical standpoint, while nobody on earth can say with certainty if an electron candidate is a true electron of calorimeter noise or a pizero traveling close to a pi-plus or what. I am afraid, in other words, that what you propose is a way to do science that we cannot do any longer in present day experiments. It is meaningful in neutrino scattering experiments, on the other hand.

> Back then there were only a few dilepton events and really not-so-many
> semileptonic events, and there were papers like the 1997 UC Berkeley thesis
> of Erich Varnes at was back then available to the public on the web at
> http://www-d0.fnal.gov/publications_talks/thesis/thesis.html
> that described individual candidate events in great detail.
>
> Although my initial idea of 3 Tquark mass peaks came from
> seeing the 1994 CDF semileptonic histogram of only 2 or 3 dozen events
> as shown the original evidence paper,
> I would have rejected my idea a long time ago
> if 3 similar peaks had not been seen in a similar independent D0 histogram,
> and
> if the details of events as disclosed by Erich Varnes's thesis and
> similar papers had not seemed to me to be consistent with my speculation.
>
> Anyhow,
> now that luminosities are much higher and the raw number of candidate
> events is very large,
> it seems that there are two ways to go:
>
> 1 - look at all the large number of events statistically, which probably
> requires some training and may be good for detailed study of expected
> stuff, but may overlook unexpected stuff;
 
> 2 - simulate the good-old-days of just a few candidate events by taking
> really random small samples and then looking at those individually
> and in great detail, thus possibly seeing something totally unexpected.
> If something unexpected is indicated, then train a statistical thing
> to look closely to see whether it is real or only a fluctuation.

As I said, this can be done but I would bet it is very hard to convince an experimenter to part with his most powerful weapon, that is Monte Carlo simulations and comparisons to it. If one does what you propose on Monte Carlo, one will likely find very unexpected things too... Even if the simulated process is qcd dijet production. I say this out of experience!

> I think that the large collaborations of today love the purely statistical
> approach, because it is in fact very useful for known or expected stuff,
> and is likely to produce consensus results without unpleasant dissent,
> something that bureaucracies (both of the collaboration and the funding
> agencies) like and are comfortable with.

I agree with you, but I think the alternative is not what you propose, rather it is a signature-based search.

>
> However, I wish that the small-random-sample-in-detail approach were
> also employed to some degree, on the (maybe unlikely but who knows)
> chance of actually seeing something really unexpected.
>
> Tony
>
> PS - Thanks again very much for your very patient and clear
> answers to my speculative questions.

You're welcome...

Cheers,

T.

 


Tony Smith's Home Page

......