Improving WH NN Analysis Using Looser Isolated Tracks with 5.7 fb^-1, Q and A page

Contact person : Weiming Yao for WH NN group

CDF Note 10184 V2.0 last updated July 1 2010.

Questions from Ben and Eric at Preblessing (6/18/2010)

Q 1. Can you double-check overlap between tight isotracks ?
A. We checked the overlap in data and wh115 and found nonoverlap as expected

Q 2. Do you have clean up on high Pt tracks ? What is your cleanup ?
A. Yes, we do require the standard isolated track selections that listed below as well as in table 1 of cdfnote 10184

Variable Cut

Pt >20

z0 <60cm

d0 <0.2 or 0.02 (w/SI)

Track Isolation >0.9

Cot Hits >=24(Axial), >=20 (Stereo)

chisq(data only) >10^-8

Number of Silicon Hits >=3 as expected

Matching to a jet dR<0.4

Q 3. For hadiso modeling, there is a difference in same-sign fakes and opposite-sign fakes because charge correlation between lepton charge from W and charge of lepton in jet. This is a rate, but may be a shape too. How do you know your shape is correct for the hadiso fake shape ?
A. The hadiso should be independent to the sign of isolated track in the jet. In fact, we compared the hadiso modeling from three different kind of fakes:

the same-sign from W+jets data,
the jets with the muon stub that failed at least 2 or more standard Muon ID cuts, but passed isolated track selection.
the same-sign vs opposite-sign from W+jets MC.
The hadiso seem consistent to each other as shown in the bottom of the plot below. For the completeness, we also compared the same-sign and opposite-sign fake from the W+jets Monte carlo and the comparions of Hadiso are shown in the plot below that seems independent to the sign of track. It's interesting to note there seems a significant fraction of events pearking near Hadiso=0 than in the fake shape from data, but the difference is well within the 40% of uncertainties of nonW fraction. Fakes in Data Fakes in W+Jets MC

Q 4. Lepton Q*Eta modeling plot is poor in the central region. Consider removing Q*Eta from NN. The model is just using muons -0.6 to 0.6. But you need to get muons in forward regions and less in central region. Could this be because of the scale factor of the leptons vetoed ? I think in Justin's analysis, when they vetoed lepton types, they needed a reverse scale factor.
A. We tried to improve Q*Eta of non-W by including all loose muon types. The new distribution does not seem improve much compared to the original in the left, the new one in the left. We also looked the contributions (1-SF) from the lepton veto, but the effect is negligible. To understand the Q*Eta MC modeling, we checked the Q*Eta of isolated track using well selected DY events by requiring an opposite-sign pair of tight lepton and isolated track(ISOTKh). There seems more events near eta=0 in MC than Data, which is consistent with the feature in W+2jets data. To see the impact, we reweight the MC Q*Eta to matching with data and recomputed the limits shown in the table. The effect is small. We also tried to rescale the non-W fraction by 40% and the changes of the limit are small.
with loose cmu/cmp Including all loose muons
Q*Eta from DY After Rescaling

Mass 100 105 110 115 120 125 130 135 140 145 150

Nominal (expected) 10.6933 12 13.8 14.94 18.48 21.19 28.53 37.58 53.97 74.08 131.18

Rescaling Q*Eta (expected) 11.1733 11.9667 13.395 14.92 18.45 21.64 27.38 38.64 54.53 76.1 129.46

&Delta/Nominal(expected) 4.48879% -0.2775% -2.93478% -0.133869% -0.162338% 2.12364% -4.03084% 2.82065% 1.03761% 2.72678% -1.31118%

Nominal (observed) 5.41579 6.41314 7.53659 9.87374 13.8178 18.1148 25.4492 34.2344 57.394 76.5213 134.994

Rescaling Q*Eta (observed) 5.39833 6.36549 7.42979 9.8048 13.4051 17.9015 24.4312 33.5025 55.9079 75.3543 132.823

&Delta/Nominal(observed) -0.322391% -0.743006% -1.41709% -0.698216% -2.98673% -1.17749% -4.00013% -2.13791% -2.5893% -1.52507% -1.60822%

Q 5. Make macros compatible with main WH analysis.
A. Done and updated in the cdfnote.

Q 6. p.13, MET distribution, looks like 4 out of 7 data points are 2-sigma low.
A. Since the met distribution in the pretag seems in a resonable agreement, we suspect that could be a statstic fluctuation.

Q 7. Would be good to see log scale on MET and lepton PT distributions for pre-tag and single-tag. To see if tail is okay, and make sure there are no anomalously high PT muons.
A. We remade the plots in both linear and log scales, see CDF Note 10184 V2.0 for more details.

Q 8. KIT in ST looks a bit low in mis-tag side. Perhaps the same issue as Yoshikazu's analysis. ME analysis had a nice KIT distribution, and they were using a different W+LF model. Could WH ME analysis make plot of KIT for loose muons ?
A. For the plot below, we count the number of single tagged events with kit<0 and kit>0, which seems consistent within the statistics.

Kitnn <0 >0

Data 130 132

Expected 170+-28 141+-23

Q 9. We can't see signal on any NN output plots. Please re-scale plots to see signal.
A. The plots are updated in both linear and log scale.

More Questions from Ben and Eric on Jun 30

Q 10. We believe that it is not valid to build a non-W model based on a modified class of leptons which have a significantly different fiducial coverage than the class of leptons that they are attempting to model. This is clearly the case here since the loose muons (CMU and CMP) have much different fiducial coverage then those selected in the loose isotrk analysis. Personally, I still have questions about why non-isolated muons would be a good model for non-W loose isotrk events (see below), but at a bare minimum for blessing we believe that it is at least necessary to switch to a non-W model based on non-isolated muon candidates selected over the entire accessible eta-phi range (including CMUP, CMX, and CMIOs).
A. The CMUP and CMX have been included as you requested at the preblessing meeting, see Q 4 above. CMIOs would make it since requiring a tight jet matching.

Q 11. We do not think it is valid to include poorly modeled variables as inputs to the NN. The fact that the expected and observed limits do not change significantly when the variable is removed from the NN is not a valid reason for keeping it. So, another minimum requirement that we would have for blessing is that the Q*eta variable should be removed as a NN input (unless a way was found to improve the modeling in the mean time).
A. We have both results, either to include or not Q*Eta. We checked Q*Eta from DY events and seems have the same feature, see the plots in Q 4. We could fix it by simple reweighting and have a negligible effect on the final limit.

Q 12. On loose isotrk selection, a couple of questions.
Q 12.1 Is the requirement that the met can not point within dR = 0.4 of any jet also applied to the jet matched to the isolated track (lepton candidate)?
A. The jet matched to the isolated track is not considered, but there is qcd veto ( mtw cut at 10 GeV), so met is required to not point to the lepton jet.
Q 12.2 Is the jet that is matched with the lepton required to be |eta_det| < 2.0 and Et > 20 GeV or are these only requirements for the other (b-candidate) jets in the event.
A. yes, any tight jet (Et cut at 20 at level 5 and eta at 2).
Q 12.3 If the requirements for the jet matched to the isolated track are different, what are they?
A. the same requirements

Q 13. More generally, we feel like we need to see some plots that illustrate how well the loose isotrk selection cuts are modeled in MC. Another way of asking this is do we need a scale factor to account for differences between selection cut efficiencies in data/MC? For the electron part of the acceptance this could probably be done directly using your sample of Z tight electron plus track candidates. For the tau component of the acceptance, this is obviously trickier. However, we would at least like to see MC versus data plots for various variables related to lepton selection such as matched jet Et, matched jet eta, deltaR between the track and matched jet, track isolation, angle between track candidate and met, etc...
A. Yes, these are good ideas. In fact, we have used the DY events extensively to check the data and MC modeling. It looks good in matched jet et, Dr, and delta phi, but Q*Eta does not match well as expected. Hadiso seems modeled well in MC, so we are using the same SF as tight isolated track since the rest of cuts are the same besides the jet requirement and Hadiso cut.
Matched Jet Hadiso

Q 14. We need to understand exactly what selection criteria are applied to select the non-isolated muons that are used to model non-W loose isotrks. Aside from the additional isolation criteria, are the other selection cuts exactly the same as those used for the loose isotrk candidates. In particular, do you still require track isolation and a matching jet?
A. These muons are required to match with a tight jet, failing at least two standard Muon cut, passing the loose isotrk selection, and non-isolated in calorimeter. We do check the non-W without non-isolation in calorimeter as part of systematic and now it's our default non-W model. We also checked the same-sign from W+Jets, which does not have the isolation requirement. The fit with different shapes give about 40% uncertainties.

Q 15. What is the purpose of making the extra calorimeter- based isolation cut for picking the non-W model events? Wouldn't this directly bias the had-iso distribution which you use to fit for the non-W fraction? Also wouldn't a reverse track isolation cut be less biased in terms of the had-iso variable?
A. Yes, the extra calorimeter-based isolation cut is to make sure these are non-W and we agree there is some bias on hadiso, which is part of systematic. Because of this reason, we switched the non-W model without isolation requirement that seems fit data better.

Q 16. Is it possible to break down signal contributions from electrons and taus? Are these both contained within the same MC sample or are different samples used?
A. Yes, we can. We require the isotkh candidate matching to HEPG W electron, muon, tau within cone 0.4 using wh115 signal sample. The majority of events are from W->ele (80.56%), muon(5.8%), tau(13.2%), and unknown (0.4%).

Q 17. With respect to the Q*eta variable, what eta is used (lead jet eta, isotrk eta, or something else)?
A. isotrack eta. The reason is to take some advantage of the charge asymmetry of the W+jets production while WH production does not. Yoshikazu did optimize the choise of variables and found it quite sensitive to reject w+jets. I am using the same BNN as Yoshikazu and so is the Q*Eta.

Weiming Yao

Variable	Cut
Pt	>20
z0	<60cm
d0	<0.2 or 0.02 (w/SI)
Track Isolation	>0.9
Cot Hits	>=24(Axial), >=20 (Stereo)
chisq(data only)	>10^-8
Number of Silicon Hits	>=3 as expected
Matching to a jet	dR<0.4

Mass	100	105	110	115	120	125	130	135	140	145	150
Nominal (expected)	10.6933	12	13.8	14.94	18.48	21.19	28.53	37.58	53.97	74.08	131.18
Rescaling Q*Eta (expected)	11.1733	11.9667	13.395	14.92	18.45	21.64	27.38	38.64	54.53	76.1	129.46
&Delta/Nominal(expected)	4.48879%	-0.2775%	-2.93478%	-0.133869%	-0.162338%	2.12364%	-4.03084%	2.82065%	1.03761%	2.72678%	-1.31118%
Nominal (observed)	5.41579	6.41314	7.53659	9.87374	13.8178	18.1148	25.4492	34.2344	57.394	76.5213	134.994
Rescaling Q*Eta (observed)	5.39833	6.36549	7.42979	9.8048	13.4051	17.9015	24.4312	33.5025	55.9079	75.3543	132.823
&Delta/Nominal(observed)	-0.322391%	-0.743006%	-1.41709%	-0.698216%	-2.98673%	-1.17749%	-4.00013%	-2.13791%	-2.5893%	-1.52507%	-1.60822%