IJazZ

Derive Scale and Smearing

ijazz_sas config/sas_config.yaml

where the config file is config/sas_config.yaml:

Example YAML Configuration
file_dt: data/cms/2022/higgs_dna_2022preEE.pho.data.TimeCorr.parquet
file_mc: data/cms/2022/higgs_dna_2022preEE.pho.mc.parquet
dir_results: results/2022/EtaR9
cset_name: "EtaR9"  # name of the correction 
cat_latex: # latex names for parameters plots
  ScEta: SuperCluster $\eta$
  AbsScEta: SuperCluster $|\eta|$
  r9: Seed Cluster R9
  pt: $p_T$ (GeV)
dset_name: 2022preEE     # dataset name
scale_flat_syst: 0.5e-3  # added in quadrature to the full list of scale systematics
smear_flat_syst: 0       # added in quadrature to the full list of smear systematics

syst: # list of systematics to be computed
  win_mll: # name of systematic
    # parameters to be overwritten
    fitter:
      win_z_mc: [65, 115]
      win_z_dt: [70, 110]
  cut_variation: # name of systematic
    # parameters to be overwritten
    sas:
      cut: pt1 > 30 and pt2 > 30
corrlib:
  cset_description: "EM object scale and smearing vs eta / r9"
  cset_version: 1
  
sas:
  use_rpt: true         # - when categories include pt, fit with the relative pt 
  hess: numerical       # - hessian matrix: null, numerical, analytical (not advised)
  learning_rate: 1.0e-3 # - learning rate to the keras optimizer 
  name_pt_var: pt       # - name of the pt variable in case used in categorisation
  err_mc: true          # - compute the uncertainty due to limited MC statistics
  correct_data: true    # - correct the data
  correct_mc: true      # - smear the MC
  categories:           # - binning of the categories
      ScEta: [-3.0, -2.0, -1.49, -1.0, 0.0, 1.0, 1.49, 2.0, 3.0]
      r9: [-.Inf, 0.97, .inf]
  # - cut to apply in the input dataframes
  cut: pt1 > 25 and pt2 > 25
fitter:
  win_z_mc: [70, 110]            # - mass range of the dilepton mass to fit
  win_z_dt: [80, 100]            # - larger mass range to consider MC events
  min_nevt_region_mc: 100        # - minimum number of mc events per category
  min_nevt_region_dt: 20         # - minimum number of data events per category
  bin_width_dt: 'Q'              # - binning width for the data, 'Q' for quantile binning
  bin_width_mc: 0.1              # - binning width for the MC
  name_cat: cat                  # - name of the category variable
  name_weights: weight_central   # - name of the weights variable for MC
  name_mll: mass                 # - name of the di-lepton mass variable  
minimizer:      
  dnll_tol: 0.01                 # - tolerance for the change in -2logL to determine convergence
  max_epochs: 500                # - maximum number of epochs for optimization
  init_rand: False               # - if True, initializes variables (resp, reso) randomly.
  nepoch_print: 100              # - number of epochs to print the loss
  batch_size: 200                # - size of the batch for likelihood computation
  batch_training: True           # - use batch 
  device: GPU                    # - device to use for the minimizer
  minimizer: Adam                # - optimization method, either 'Adam' or a SciPy minimizer (e.g., 'TNC').

Command-line tools

This package installs the following entry points:

  • ijazz_sas: run the main Scale and Smearing fit from a YAML config.

  • ijazz_sas_mmg: run the Photon->Electron scale workflow (MMG-specific).

  • ijazz_sas_smoothing: smooth a JSON result vs pT and optionally build correctionlib output.

  • ijazz_plot: plot IJazZ JSON results and comparisons.

Run any command with -h to see available options.

Outputs

By default, running ijazz_sas writes JSON results and plots under dir_results. Typical outputs include:

  • SAS<dset_name>_syst-Nominal.json for the nominal fit.

  • SAS<dset_name>_syst-<syst>.json for each systematic variation.

  • SAS<dset_name>_syst-NominalWithSyst.json (and ...Nominal2GWithSyst.json when 2G is enabled).

  • Plot images alongside the JSON files (e.g. .fig*.jpg).

  • Correctionlib JSON: EGMScalesSmearing_<dset_name>.v<cset_version>.json.

ijazz_sas_smoothing can also produce a smoothed correctionlib JSON named EGMScalesSmearing_<dset_name>_SMOOTH.v<cset_version>.json. Input Ntuples —————-

Input files must be parquet files with a column for the dilepton mass (fitter.name_mll) and columns for variables of each electron with 1/2 suffixes (e.g. ScEta1, ScEta2, r91, r92, pt1, pt2). MC weights can be used via fitter.name_weights. For example: - mass, weight_central, ScEta1, ScEta2, r91, r92, pt1, and pt2.

A reader to convert Higgs DNA output files to IJazZ input files is provided in this package. It is also used in the law_ijazz workflow.

Python API (quick reference)

The following symbols are exported from ijazz:

  • RegionalFitter

  • IJazZSAS, compute_sas

  • parameters_from_json, parameters_to_json, ijazz_shape

  • plot_results_from_json

See the inline docstrings for details.

Plot the SaS Results

By default, plots are automatically generated. You can customize plot settings or compare two JSON files—either by overlaying their plots or by displaying the ratio between them.

Plot Nominal Results

To plot nominal results from a JSON file:

ijazz_plot FineEtaR9/SAS2024_syst-Nominal.json -d r9 AbsScEta --resp 0.98 1.03 --reso 0 0.06 -o FineEtaR9/SAS2024_syst-Nominal.jpg --latex higgsdna_cat_latex.json --ncol 2 --leg_fontsize small

where higgsdna_cat_latex.json is a dictionary mapping variable names to LaTeX names for plotting:

{
  "ScEta": "$\\eta_{SC}$",
  "AbsScEta": "$|\\eta_{SC}|$",
  "r9": "SC R9",
  "pt": "$p_T$ (GeV)",
  "seedGain": "SC Gain"
}

The option -d r9 AbsScEta defines the axis order for plotting. Here, r9 is on the x-axis, and results as a function of AbsScEta are overlaid.

You can also generate the plot in Python:

import ijazz.plotting

var_latex = {
    "ScEta": "$\\eta_{SC}$",
    "AbsScEta": "$|\\eta_{SC}|$",
    "r9": "Seed Cl. R9",
    "pt": "$p_T$ (GeV)",
    "seedGain": "Seed Cl. Gain"
}

figs, ax = ijazz.plotting.plot_results_from_json(
    'Pho/FineEtaR9/SAS2024_syst-Nominal.json',
    resp_range=[0.98, 1.03],
    reso_range=[0.0, 0.06],
    cat_latex=var_latex,
    dim=['r9', 'AbsScEta'],
    leg_fontsize='small',
    leg_ncols=2
)

Compare SaS JSON Files

You can compare JSON files by calculating the ratio or overlaying their plots.

Ratio Comparison

To plot the ratio of scale and smearing between electron and photon regression:

import ijazz.plotting

var_latex = {
    "ScEta": "$\\eta_{SC}$",
    "AbsScEta": "$|\\eta_{SC}|$",
    "r9": "Seed Cl. R9",
    "pt": "$p_T$ (GeV)",
    "seedGain": "Seed Cl. Gain"
}

jsonEle = 'Ele/FineEtaR9/SAS2024_syst-Nominal.json'
jsonPho = 'Pho/FineEtaR9/SAS2024_syst-Nominal.json'

figs, ax = ijazz.plotting.plot_results_from_json(
    {'syst': jsonEle, 'nominal': jsonPho},
    jsons_mode='ratio',
    resp_range=[0.995, 1.008],
    reso_range=[0.50, 1.5],
    cat_latex=var_latex,
    leg_fontsize='small',
    leg_ncols=2
)

Overlay Comparison

To overlay scale and smearing results from electron and photon regression:

import ijazz.plotting

var_latex = {
    "ScEta": "$\\eta_{SC}$",
    "AbsScEta": "$|\\eta_{SC}|$",
    "r9": "Seed Cl. R9",
    "pt": "$p_T$ (GeV)",
    "seedGain": "Seed Cl. Gain"
}

jsonEle = 'Ele/FineEtaR9/SAS2024_syst-Nominal.json'
jsonPho = 'Pho/FineEtaR9/SAS2024_syst-Nominal.json'

# Plot scale and smearing if SaS are functions of 1 or 2 variables
figs, ax = ijazz.plotting.plot_results_from_json(
    {'Electron': jsonEle, 'Photon': jsonPho},
    jsons_mode='compare',
    resp_range=[0.95, 1.08],
    reso_range=[0.0, 0.08],
    cat_latex=var_latex
)

# Plot scale only if SaS are functions of 3 variables
figs, ax = ijazz.plotting.plot_results_from_json(
    {'Electron': jsonEle, 'Photon': jsonPho},
    jsons_mode='compare',
    resp_range=[0.95, 1.08],
    reso_range=[0.0, 0.08],
    cat_latex=var_latex,
    param_to_plot='resp'
)

# Plot smearing only if SaS are functions of 3 variables
figs, ax = ijazz.plotting.plot_results_from_json(
    {'Electron': jsonEle, 'Photon': jsonPho},
    jsons_mode='compare',
    resp_range=[0.95, 1.08],
    reso_range=[0.0, 0.08],
    cat_latex=var_latex,
    param_to_plot='reso'
)

Note

If the SaS are functions of three variables, the second dimension must have length 2 (e.g., low and high R9).

It also works with more than two input JSONs:

ijazz.plotting.plot_results_from_json({'2022': json22, '2023': json23, '2024': json24}, jsons_mode='compare')