Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLUE algorithm and Phoenix visualiser #1416

Open
wants to merge 41 commits into
base: trunk
Choose a base branch
from
Open

Conversation

Lysarina
Copy link

@Lysarina Lysarina commented Aug 28, 2024

I am updating ldmx-sw, here are the details.

What are the issues that this addresses?

This resolves #1411

I have worked with ECal clustering this summer and implemented the CLUE algorithm from CMS. The main goal was to use this for electron clustering so I've strived for number of clusters == number of electrons, and also working to get a high energy purity (i.e. how much of the energy in the cluster comes from the same initial electron). It works pretty well! I've also added a simple reclustering clause to try and handle merged clusters, which results in a more prominent peak at number of clusters == number of electrons, but also introduces some overcounting, so it is currently an optional parameter.

This also partly resolves #1394 in getting the cluster producer to work

Additionally, I have added a Phoenix-based visualiser LDMX-VIS, located in EventDisplay/ldmx-vis. I have also made an analyzer VisGenerator in DQM (maybe not the right folder for this) that writes event data into JSON files that can be visualised by Phoenix.

Fixes

While I feel mostly happy with the code, I created what is basically a copy of WorkingCluster.h in Ecal called WorkingEcalCluster, that stores EcalHits as normal objects instead of pointers, as when I started I did not really know how to handle pointers :^) I never had time to fix so that CLUE would use WorkingCluster instead but this would be a nice fix. WorkingEcalCluster does contain some other improvements such as a parameterless constructor and eliminating EcalGeometry as it is not actually needed in that file.

Check List

  • I successfully compiled ldmx-sw with my developments
  • I ran my developments and the following shows that they are successful.

Below are histograms produced by EcalClusterAnalyzer, an analyzer I created to analyze the performance of the clustering

Below are some graphs for unmodified CLUE
2 electrons
number_of_clusters_2e
energy_purity_2e
clusterless_percentage_2e
3 electrons
number_of_clusters_3e
energy_purity_3e
clusterless_percentage_3e

versus the initial algorithm (TemplatedClusterFinder)
2 electrons
number_of_electrons_2e
energy_purity_2e
clusterless_percentage_2e
3 electrons
number_of_electrons_3e
energy_purity_3e
clusterless_percentage_3e

Reclustering with CLUE
2e
number_of_clusters_2e_reclustering
3e
number_of_clusters_3e_reclustering

Visualisation examples
3e_clue_32
layers_7

Here is the (slightly messy) config I use for ldmx fire

from LDMX.Framework import ldmxcfg
p = ldmxcfg.Process('sim')
events = 100
p.maxEvents = events
p.termLogLevel = 0
p.logFrequency = 1

nbrOfElectrons = 2

# Initial algo
seedThresh = 350. 
cutoff = 10.

# CLUE
CLUE = True
debug = False
recluster = False
layers = 1
dc = 0.
rhoc = 550.
deltac = 10.
deltao = 40.

inputFiles = []
for i in range(1, 31):
    inputFiles.append(f"data/{nbrOfElectrons}e/sim_{i}_{nbrOfElectrons}e.root")
p.inputFiles = inputFiles
p.outputFiles = [
 "all.root"
 ]
p.histogramFile = "clusters.root"

import LDMX.Ecal.EcalGeometry # geometry required by sim

# import chip / geometry conditions
import LDMX.Ecal.ecal_hardcoded_conditions
import LDMX.Ecal.digi as ecal_digi
import LDMX.DQM.dqm as dqm
import LDMX.Ecal.ecalClusters as cl
import LDMX.Ecal.ecal_trig_digi as etrigdigi
import LDMX.Trigger.trigger_energy_sums as etrig

json = dqm.VisGenerator()
json.filename = "vis.json"
json.originIdAvailable = True
json.nbrOfElectrons = nbrOfElectrons

cluster = cl.EcalClusterProducer()
cluster.seedThreshold = seedThresh
cluster.cutoff = cutoff
# CLUE
cluster.CLUE = CLUE
cluster.dc = dc
cluster.rhoc = rhoc
cluster.deltac = deltac
cluster.deltao = deltao
cluster.debug = debug
cluster.nbrOfLayers = layers
cluster.reclustering = recluster

clan = cl.EcalClusterAnalyzer()
clan.nbrOfElectrons = nbrOfElectrons

p.logPerformance = True

p.sequence.extend([
    ecal_digi.EcalDigiProducer(),
    ecal_digi.EcalRecProducer(),
    cluster,
    clan,
    json,
    ])

Only non-empty clusters above seed threshold are saved
Shows incident IDs and PDG IDs that contributed to hit

Needs to be cleaned a bit

Added option for ground truth
Generates a bunch of histograms to evaluate how good clusters are
Copy link
Member

@tomeichlersmith tomeichlersmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for creating this PR! I'm wondering if its possible to separate the visualization stuff from the clustering stuff. They have rather separate purposes and so separating them would allow them to be reviewed at different rates.

No worries if not! Just wanted to check.

Copy link
Member

@tvami tvami left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second the request two break this up to two PRs, also it seems there are some files that were not supposed to be pushed, like the .vscode directory and .yarn, but also I dont think we need the CMS and ATLAS logos, etc. :)

@bryngemark
Copy link
Contributor

You might be right @tomeichlersmith that visualization and clustering are conceptually separate... even if they were used in tandem for this project. Do you have some handy github wizardry that you could share with @Lysarina for how to cherry pick the different parts into two separate PRs, to help this move along?

Also, any good ideas for where to keep the visualizer analyzer? In EventDisplay or DQM?

@tomeichlersmith
Copy link
Member

In this specific situation, I would just make new branches and copy over your updates to those new branches. This also give you the opportunity to make branch names that reference the issue number. You can use git checkout to do this copying by specifying the files you want to get from that branch after --. (Note: tab-complete probably won't work since these files don't exist on trunk.)

git switch trunk
git pull
git switch -c 1411-ecal-clue-clustering
# just an example, you'll need to do more of these to get all of the files
git checkout ella-dev-clustering -- Ecal/include/Ecal/CLUE.h

You can git commit whenever you want. The files you've checkouted this way will already be git added. I would suggest many small commits so that the commit messages reference what you are adding but to each their own.

The same procedure can be done for the vis branch (make sure to switch back to trunk before creating the new branch), but I have some cleanup notes that may be helpful at this stage.

  • I think the processors that produce JSON for Phoenix should reside in the EventDisplay submodule. You can delete everything that is currently there - it is broken and no-one has used it in a long time. Use git mv after you git checkout a file so that git registers the move.
  • I do not want a full copy of nlohmann json.hpp in our source repo. I think adding it as a submodule git submodule add https://github.com/nlohmann/json.git EventDisplay/json makes the most sense and then using add_subdirectory within EventDisplay/CMakeLists.txt. https://json.nlohmann.me/integration/cmake/#embedded
  • I would like to avoid copying a pile of Typescript into ldmx-sw so if we can use Phoenix another way, I would prefer that; however, if we need to create our own Typescript application in order to properly use Phoenix and all its features, then I second @tvami 's points about removing the extra files pertaining to other experiments. (Maybe we put the Typescript app in some other repository? Does it even run from within the container? These are the types of details I would iron out on an event display-specific PR)

@EinarElen
Copy link
Contributor

* I do not want a full copy of nlohmann json.hpp in our source repo. I think adding it as a submodule `git submodule add https://github.com/nlohmann/json.git EventDisplay/json` makes the most sense and then using `add_subdirectory` within `EventDisplay/CMakeLists.txt`. https://json.nlohmann.me/integration/cmake/#embedded

Is there a reason for not wanting the full header? I think that is a super common way to use it

@tomeichlersmith
Copy link
Member

I don't have a good reason, I just like that its easier to find the original project (and thus its documentation). I also want to point out that acts also uses nlohmann/json so we could do an install into the image for both to use in the future.

Probably should have worded that comment less strongly - I would like to move it to a submodule but its not necessary. If we move to putting it in the image, then we can just as easily remove the header here as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Ecal clustering with the CLUE algorithm Tune clustering within the ECal
5 participants