IJazZ
Derive Scale and Smearing
ijazz_sas config/sas_config.yaml
where the config file is config/sas_config.yaml:
Example YAML Configuration
file_dt: data/cms/2022/higgs_dna_2022preEE.pho.data.TimeCorr.parquet
file_mc: data/cms/2022/higgs_dna_2022preEE.pho.mc.parquet
dir_results: results/2022/EtaR9
cset_name: "EtaR9" # name of the correction
cat_latex: # latex names for parameters plots
ScEta: SuperCluster $\eta$
AbsScEta: SuperCluster $|\eta|$
r9: Seed Cluster R9
pt: $p_T$ (GeV)
dset_name: 2022preEE # dataset name
scale_flat_syst: 0.5e-3 # added in quadrature to the full list of scale systematics
smear_flat_syst: 0 # added in quadrature to the full list of smear systematics
syst: # list of systematics to be computed
win_mll: # name of systematic
# parameters to be overwritten
fitter:
win_z_mc: [65, 115]
win_z_dt: [70, 110]
cut_variation: # name of systematic
# parameters to be overwritten
sas:
cut: pt1 > 30 and pt2 > 30
corrlib:
cset_description: "EM object scale and smearing vs eta / r9"
cset_version: 1
sas:
use_rpt: true # - when categories include pt, fit with the relative pt
hess: numerical # - hessian matrix: null, numerical, analytical (not advised)
learning_rate: 1.0e-3 # - learning rate to the keras optimizer
name_pt_var: pt # - name of the pt variable in case used in categorisation
err_mc: true # - compute the uncertainty due to limited MC statistics
correct_data: true # - correct the data
correct_mc: true # - smear the MC
categories: # - binning of the categories
ScEta: [-3.0, -2.0, -1.49, -1.0, 0.0, 1.0, 1.49, 2.0, 3.0]
r9: [-.Inf, 0.97, .inf]
# - cut to apply in the input dataframes
cut: pt1 > 25 and pt2 > 25
fitter:
win_z_mc: [70, 110] # - mass range of the dilepton mass to fit
win_z_dt: [80, 100] # - larger mass range to consider MC events
min_nevt_region_mc: 100 # - minimum number of mc events per category
min_nevt_region_dt: 20 # - minimum number of data events per category
bin_width_dt: 'Q' # - binning width for the data, 'Q' for quantile binning
bin_width_mc: 0.1 # - binning width for the MC
name_cat: cat # - name of the category variable
name_weights: weight_central # - name of the weights variable for MC
name_mll: mass # - name of the di-lepton mass variable
minimizer:
dnll_tol: 0.01 # - tolerance for the change in -2logL to determine convergence
max_epochs: 500 # - maximum number of epochs for optimization
init_rand: False # - if True, initializes variables (resp, reso) randomly.
nepoch_print: 100 # - number of epochs to print the loss
batch_size: 200 # - size of the batch for likelihood computation
batch_training: True # - use batch
device: GPU # - device to use for the minimizer
minimizer: Adam # - optimization method, either 'Adam' or a SciPy minimizer (e.g., 'TNC').
Input Ntuples
Input files must be parquet files with a column for the dilepton mass name_mll and columns for variables of each electron var1 and var2. MC weights can be used using name_weights.
For example:
- mass, weight_central, ScEta1, ScEta2, r91, r92, pt1, and pt2.
A reader to convert Higgs DNA output files to IJazZ input files is provided in this package. It is also used in the law_ijazz workflow.