-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Basic readme updates * FIrst config to get sx going (does not work yet!) * Ignore vscode settings for now * Ignore dumb error * The sx yaml file * Basics of how we can do extraction from a datasample * Added text * UPdated readme * Apply Alex's comments
- Loading branch information
1 parent
fa97782
commit 49c6a55
Showing
5 changed files
with
313 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -161,3 +161,8 @@ cython_debug/ | |
|
||
# custom | ||
*.root.* | ||
|
||
# vscode | ||
.vscode/ | ||
|
||
servicex.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
jupyterlab | ||
servicex | ||
awkward | ||
hist[dask] | ||
# Necessary due to bug in uproot/dask-awkward that prevents | ||
# ak.concat working. | ||
dask_awkward==2024.2.0 | ||
uproot | ||
# Get the version with PHYSLITE support sort-of built it | ||
func_adl_servicex_xaodr21>=2.0a1 | ||
ipywidgets |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,270 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Exploring The Data\n", | ||
"\n", | ||
"Looking at the data to see how to access enough columns to make this relevant." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stdout", | ||
"output_type": "stream", | ||
"text": [ | ||
"Using release 21.2.231\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"from func_adl_servicex_xaodr21 import atlas_release\n", | ||
"# TODO: Update to use R22/23 or whatever.\n", | ||
"from func_adl_servicex_xaodr21 import SXDSAtlasxAODR21\n", | ||
"\n", | ||
"from hist.dask import Hist\n", | ||
"import dask_awkward as dak\n", | ||
"\n", | ||
"print(f'Using release {atlas_release}')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Setup the dataset we will use for testing." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ttbar_all_rucio_dataset_name = \"mc23_13p6TeV.601229.PhPy8EG_A14_ttbar_hdamp258p75_SingleLep.deriv.DAOD_PHYSLITE.e8514_s4162_r14622_p6026\"\n", | ||
"ttbar_all = f\"rucio://{ttbar_all_rucio_dataset_name}?files=1\"\n", | ||
"ds = SXDSAtlasxAODR21(ttbar_all, backend='atlasr22')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## ServiceX Query\n", | ||
"\n", | ||
"Do an event-level query - so lists of jets, met, etc, all at the top level." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"WARNING:root:Fetched the default calibration configuration for a query. It should have been intentionally configured - using configuration for data format PHYS\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"# TODO: The EventInfo argument should default correctly (that may just be a matter of using func_adl xaod r22)\n", | ||
"# TODO: dataclass should be supported so as not to lose type-following!\n", | ||
"query = (ds\n", | ||
" .Select(lambda e: {\n", | ||
" 'evt': e.EventInfo(\"EventInfo\"),\n", | ||
" 'jet': e.Jets(\"AnalysisJets\", calibrate=False)\n", | ||
" })\n", | ||
" .Select(lambda ei: {\n", | ||
" 'event_number': ei.evt.eventNumber(),\n", | ||
" 'run_number': ei.evt.runNumber(),\n", | ||
" 'jet_pt': ei.jet.Select(lambda j: j.pt()/1000)\n", | ||
" })\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"We do not have tight integration into `dask_awkward` until there is extra code working, so lets grab all the data." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 4, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Start by grabbing the data as an awkward array\n", | ||
"# TODO: Files should remain in the S3 cache and be read directly from there\n", | ||
"data = query.AsAwkwardArray().value()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Plots\n", | ||
"\n", | ||
"Next, lets make plots of everything" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 5, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Quick construction, no other imports needed:\n", | ||
"h = (\n", | ||
" Hist.new.Reg(20, 0, 100000000, name=\"x\", label=\"x-axis\")\n", | ||
" .Int64()\n", | ||
")\n", | ||
"r1 = h.fill(data.event_number)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 6, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"# Quick construction, no other imports needed:\n", | ||
"h = (\n", | ||
" Hist.new.Reg(20, 0, 200, name=\"x\", label=\"Jet $p_T$\")\n", | ||
" .Int64()\n", | ||
")\n", | ||
"r2 = h.fill(dak.flatten(data.jet_pt))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<html>\n", | ||
"<div style=\"display:flex; align-items:center;\">\n", | ||
"<div style=\"width:290px;\">\n", | ||
"<svg xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"-10 -105 270 120\">\n", | ||
"<line x1=\"-5\" y1=\"0\" x2=\"255\" y2=\"0\" style=\"fill:none;stroke-width:2;stroke:currentColor\"/>\n", | ||
"<text text-anchor=\"middle\" x=\"0\" y=\"15\" style=\"fill:currentColor;\">\n", | ||
"0\n", | ||
"</text>\n", | ||
"<text text-anchor=\"middle\" x=\"250\" y=\"15\" style=\"fill:currentColor;\">\n", | ||
"1e+08\n", | ||
"</text>\n", | ||
"<text text-anchor=\"middle\" x=\"125.0\" y=\"15\" style=\"fill:currentColor;\">\n", | ||
"x-axis\n", | ||
"</text>\n", | ||
"<polyline points=\" 0,0 0,-0 12.5,-0 12.5,-0 25,-0 25,-0 37.5,-0 37.5,-0 50,-0 50,-0 62.5,-0 62.5,-0 75,-0 75,-0 87.5,-0 87.5,-0 100,-0 100,-0 112.5,-0 112.5,-0 125,-0 125,-0 137.5,-0 137.5,-0 150,-0 150,-0 162.5,-0 162.5,-0 175,-0 175,-0 187.5,-0 187.5,-100 200,-100 200,-0 212.5,-0 212.5,-0 225,-0 225,-0 237.5,-0 237.5,-0 250,-0 250,0\" style=\"fill:none; stroke:currentColor;\"/>\n", | ||
"</svg>\n", | ||
"</div>\n", | ||
"<div style=\"flex=grow:1;\">\n", | ||
"Regular(20, 0, 1e+08, name='x', label='x-axis')<br/>\n", | ||
"<hr style=\"margin-top:.2em; margin-bottom:.2em;\"/>\n", | ||
"Int64() Σ=150000.0\n", | ||
"\n", | ||
"</div>\n", | ||
"</div>\n", | ||
"</html>" | ||
], | ||
"text/plain": [ | ||
"Hist(Regular(20, 0, 1e+08, name='x', label='x-axis'), storage=Int64()) # Sum: 150000.0" | ||
] | ||
}, | ||
"execution_count": 7, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"r1.compute()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"data": { | ||
"text/html": [ | ||
"<html>\n", | ||
"<div style=\"display:flex; align-items:center;\">\n", | ||
"<div style=\"width:290px;\">\n", | ||
"<svg xmlns=\"http://www.w3.org/2000/svg\" viewBox=\"-10 -105 270 120\">\n", | ||
"<line x1=\"-5\" y1=\"0\" x2=\"255\" y2=\"0\" style=\"fill:none;stroke-width:2;stroke:currentColor\"/>\n", | ||
"<text text-anchor=\"middle\" x=\"0\" y=\"15\" style=\"fill:currentColor;\">\n", | ||
"0\n", | ||
"</text>\n", | ||
"<text text-anchor=\"middle\" x=\"250\" y=\"15\" style=\"fill:currentColor;\">\n", | ||
"200\n", | ||
"</text>\n", | ||
"<text text-anchor=\"middle\" x=\"125.0\" y=\"15\" style=\"fill:currentColor;\">\n", | ||
"Jet $p_T$\n", | ||
"</text>\n", | ||
"<polyline points=\" 0,0 0,-2.02 12.5,-2.02 12.5,-100 25,-100 25,-59.4 37.5,-59.4 37.5,-31.2 50,-31.2 50,-22.1 62.5,-22.1 62.5,-17.1 75,-17.1 75,-13.7 87.5,-13.7 87.5,-10.6 100,-10.6 100,-8.31 112.5,-8.31 112.5,-6.49 125,-6.49 125,-5.12 137.5,-5.12 137.5,-4 150,-4 150,-3.04 162.5,-3.04 162.5,-2.39 175,-2.39 175,-1.88 187.5,-1.88 187.5,-1.45 200,-1.45 200,-1.16 212.5,-1.16 212.5,-0.881 225,-0.881 225,-0.726 237.5,-0.726 237.5,-0.596 250,-0.596 250,0\" style=\"fill:none; stroke:currentColor;\"/>\n", | ||
"</svg>\n", | ||
"</div>\n", | ||
"<div style=\"flex=grow:1;\">\n", | ||
"Regular(20, 0, 200, name='x', label='Jet $p_T$')<br/>\n", | ||
"<hr style=\"margin-top:.2em; margin-bottom:.2em;\"/>\n", | ||
"Int64() Σ=1435200.0 <em>(1450989.0 with flow)</em>\n", | ||
"\n", | ||
"</div>\n", | ||
"</div>\n", | ||
"</html>" | ||
], | ||
"text/plain": [ | ||
"Hist(Regular(20, 0, 200, name='x', label='Jet $p_T$'), storage=Int64()) # Sum: 1435200.0 (1450989.0 with flow)" | ||
] | ||
}, | ||
"execution_count": 8, | ||
"metadata": {}, | ||
"output_type": "execute_result" | ||
} | ||
], | ||
"source": [ | ||
"r2.compute()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.10" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Introduction | ||
|
||
This directory contains scripts and notebooks to implement fetching the data locally using ServiceX. | ||
|
||
The default `servicex.yaml` file was used from the UChicago AF. | ||
|
||
Note that you'll need to be aware of the `requirements.txt` as a bug in `dask_awkward` means this can't run on the most recent version. | ||
|
||
## Files | ||
|
||
| File | Description | | ||
|------|-------------| | ||
| 00-exploring-the-data | Outlines the raw ServiceX code that we can use. We'll need to develop libraries which will obscure this code quite a bit given how many branches we'll need to load. This notebook can't run on the most recent version of `dask_awkward` - until [this bug](https://github.com/dask-contrib/dask-awkward/issues/456) is fixed. | |