IJazZ

Derive Scale and Smearing

ijazz_sas config/sas_config.yaml

where the config file is config/sas_config.yaml:

Example YAML Configuration

file_dt: data/cms/2022/higgs_dna_2022preEE.pho.data.TimeCorr.parquet
file_mc: data/cms/2022/higgs_dna_2022preEE.pho.mc.parquet
dir_results: results/2022/EtaR9
cset_name: "EtaR9"  # name of the correction 
cat_latex: # latex names for parameters plots
  ScEta: SuperCluster $\eta$
  AbsScEta: SuperCluster $|\eta|$
  r9: Seed Cluster R9
  pt: $p_T$ (GeV)
dset_name: 2022preEE     # dataset name
scale_flat_syst: 0.5e-3  # added in quadrature to the full list of scale systematics
smear_flat_syst: 0       # added in quadrature to the full list of smear systematics

syst: # list of systematics to be computed
  win_mll: # name of systematic
    # parameters to be overwritten
    fitter:
      win_z_mc: [65, 115]
      win_z_dt: [70, 110]
  cut_variation: # name of systematic
    # parameters to be overwritten
    sas:
      cut: pt1 > 30 and pt2 > 30
corrlib:
  cset_description: "EM object scale and smearing vs eta / r9"
  cset_version: 1
  
sas:
  use_rpt: true         # - when categories include pt, fit with the relative pt 
  hess: numerical       # - hessian matrix: null, numerical, analytical (not advised)
  learning_rate: 1.0e-3 # - learning rate to the keras optimizer 
  name_pt_var: pt       # - name of the pt variable in case used in categorisation
  err_mc: true          # - compute the uncertainty due to limited MC statistics
  correct_data: true    # - correct the data
  correct_mc: true      # - smear the MC
  categories:           # - binning of the categories
      ScEta: [-3.0, -2.0, -1.49, -1.0, 0.0, 1.0, 1.49, 2.0, 3.0]
      r9: [-.Inf, 0.97, .inf]
  # - cut to apply in the input dataframes
  cut: pt1 > 25 and pt2 > 25
fitter:
  win_z_mc: [70, 110]            # - mass range of the dilepton mass to fit
  win_z_dt: [80, 100]            # - larger mass range to consider MC events
  min_nevt_region_mc: 100        # - minimum number of mc events per category
  min_nevt_region_dt: 20         # - minimum number of data events per category
  bin_width_dt: 'Q'              # - binning width for the data, 'Q' for quantile binning
  bin_width_mc: 0.1              # - binning width for the MC
  name_cat: cat                  # - name of the category variable
  name_weights: weight_central   # - name of the weights variable for MC
  name_mll: mass                 # - name of the di-lepton mass variable  
minimizer:      
  dnll_tol: 0.01                 # - tolerance for the change in -2logL to determine convergence
  max_epochs: 500                # - maximum number of epochs for optimization
  init_rand: False               # - if True, initializes variables (resp, reso) randomly.
  nepoch_print: 100              # - number of epochs to print the loss
  batch_size: 200                # - size of the batch for likelihood computation
  batch_training: True           # - use batch 
  device: GPU                    # - device to use for the minimizer
  minimizer: Adam                # - optimization method, either 'Adam' or a SciPy minimizer (e.g., 'TNC').

Command-line tools

This package installs the following entry points:

ijazz_sas: run the main Scale and Smearing fit from a YAML config.
ijazz_sas_mmg: run the Photon->Electron scale workflow (MMG-specific).
ijazz_sas_smoothing: smooth a JSON result vs pT and optionally build correctionlib output.
ijazz_plot: plot IJazZ JSON results and comparisons.

Run any command with -h to see available options.

Outputs

By default, running ijazz_sas writes JSON results and plots under dir_results. Typical outputs include:

SAS<dset_name>_syst-Nominal.json for the nominal fit.
SAS<dset_name>_syst-<syst>.json for each systematic variation.
SAS<dset_name>_syst-NominalWithSyst.json (and ...Nominal2GWithSyst.json when 2G is enabled).
Plot images alongside the JSON files (e.g. .fig*.jpg).
Correctionlib JSON: EGMScalesSmearing_<dset_name>.v<cset_version>.json.

ijazz_sas_smoothing can also produce a smoothed correctionlib JSON named EGMScalesSmearing_<dset_name>_SMOOTH.v<cset_version>.json. Input Ntuples —————-

Input files must be parquet files with a column for the dilepton mass (fitter.name_mll) and columns for variables of each electron with 1/2 suffixes (e.g. ScEta1, ScEta2, r91, r92, pt1, pt2). MC weights can be used via fitter.name_weights. For example: - mass, weight_central, ScEta1, ScEta2, r91, r92, pt1, and pt2.

A reader to convert Higgs DNA output files to IJazZ input files is provided in this package. It is also used in the law_ijazz workflow.

Python API (quick reference)

The following symbols are exported from ijazz:

RegionalFitter
IJazZSAS, compute_sas
parameters_from_json, parameters_to_json, ijazz_shape
plot_results_from_json

See the inline docstrings for details.

Plot the SaS Results

By default, plots are automatically generated. You can customize plot settings or compare two JSON files—either by overlaying their plots or by displaying the ratio between them.

Plot Nominal Results

To plot nominal results from a JSON file:

ijazz_plot FineEtaR9/SAS2024_syst-Nominal.json -d r9 AbsScEta --resp 0.98 1.03 --reso 0 0.06 -o FineEtaR9/SAS2024_syst-Nominal.jpg --latex higgsdna_cat_latex.json --ncol 2 --leg_fontsize small

where higgsdna_cat_latex.json is a dictionary mapping variable names to LaTeX names for plotting:

{
  "ScEta": "$\\eta_{SC}$",
  "AbsScEta": "$|\\eta_{SC}|$",
  "r9": "SC R9",
  "pt": "$p_T$ (GeV)",
  "seedGain": "SC Gain"
}

The option -d r9 AbsScEta defines the axis order for plotting. Here, r9 is on the x-axis, and results as a function of AbsScEta are overlaid.

You can also generate the plot in Python:

import ijazz.plotting

var_latex = {
    "ScEta": "$\\eta_{SC}$",
    "AbsScEta": "$|\\eta_{SC}|$",
    "r9": "Seed Cl. R9",
    "pt": "$p_T$ (GeV)",
    "seedGain": "Seed Cl. Gain"
}

figs, ax = ijazz.plotting.plot_results_from_json(
    'Pho/FineEtaR9/SAS2024_syst-Nominal.json',
    resp_range=[0.98, 1.03],
    reso_range=[0.0, 0.06],
    cat_latex=var_latex,
    dim=['r9', 'AbsScEta'],
    leg_fontsize='small',
    leg_ncols=2
)

Compare SaS JSON Files

You can compare JSON files by calculating the ratio or overlaying their plots.

Ratio Comparison

To plot the ratio of scale and smearing between electron and photon regression:

import ijazz.plotting

var_latex = {
    "ScEta": "$\\eta_{SC}$",
    "AbsScEta": "$|\\eta_{SC}|$",
    "r9": "Seed Cl. R9",
    "pt": "$p_T$ (GeV)",
    "seedGain": "Seed Cl. Gain"
}

jsonEle = 'Ele/FineEtaR9/SAS2024_syst-Nominal.json'
jsonPho = 'Pho/FineEtaR9/SAS2024_syst-Nominal.json'

figs, ax = ijazz.plotting.plot_results_from_json(
    {'syst': jsonEle, 'nominal': jsonPho},
    jsons_mode='ratio',
    resp_range=[0.995, 1.008],
    reso_range=[0.50, 1.5],
    cat_latex=var_latex,
    leg_fontsize='small',
    leg_ncols=2
)

Overlay Comparison

To overlay scale and smearing results from electron and photon regression:

import ijazz.plotting

var_latex = {
    "ScEta": "$\\eta_{SC}$",
    "AbsScEta": "$|\\eta_{SC}|$",
    "r9": "Seed Cl. R9",
    "pt": "$p_T$ (GeV)",
    "seedGain": "Seed Cl. Gain"
}

jsonEle = 'Ele/FineEtaR9/SAS2024_syst-Nominal.json'
jsonPho = 'Pho/FineEtaR9/SAS2024_syst-Nominal.json'

# Plot scale and smearing if SaS are functions of 1 or 2 variables
figs, ax = ijazz.plotting.plot_results_from_json(
    {'Electron': jsonEle, 'Photon': jsonPho},
    jsons_mode='compare',
    resp_range=[0.95, 1.08],
    reso_range=[0.0, 0.08],
    cat_latex=var_latex
)

# Plot scale only if SaS are functions of 3 variables
figs, ax = ijazz.plotting.plot_results_from_json(
    {'Electron': jsonEle, 'Photon': jsonPho},
    jsons_mode='compare',
    resp_range=[0.95, 1.08],
    reso_range=[0.0, 0.08],
    cat_latex=var_latex,
    param_to_plot='resp'
)

# Plot smearing only if SaS are functions of 3 variables
figs, ax = ijazz.plotting.plot_results_from_json(
    {'Electron': jsonEle, 'Photon': jsonPho},
    jsons_mode='compare',
    resp_range=[0.95, 1.08],
    reso_range=[0.0, 0.08],
    cat_latex=var_latex,
    param_to_plot='reso'
)

Note

If the SaS are functions of three variables, the second dimension must have length 2 (e.g., low and high R9).

It also works with more than two input JSONs:

ijazz.plotting.plot_results_from_json({'2022': json22, '2023': json23, '2024': json24}, jsons_mode='compare')