View on GitHub

MCPL

Monte Carlo Particle Lists

Download this project as a zip file Download this project as a tar.gz file

The stat:sum convention

WARNING: This page is under construction or being updated

WARNING: This page concerns documentation for a future release of MCPL and is still being edited

The stat:sum convention for statistics in MCPL headers, allows the addition of custom statistics to the headers of MCPL files.

These values are encoded into MCPL header comment fields in the form stat:sum:<key>:<value>, and the values are combined through simple addition when two or more MCPL files are merged. The present page provides both a brief introduction to these stat:sum entries, and after an introduction includes detailed descriptions of the syntax, software support, and guidelines for how to deal with stat:sum values in various scenarios where MCPL files are filtered, merged, split, or otherwise edited.

Introduction

In short, the stat:sum convention brings an often-requested feature to MCPL: Statistics in the header which are automatically combined when files are merged!

More specifically, stat:sum values are to be combined via simple addition when files are merged. Examples of such statistics could for instance be “number of seed particles in simulation run”, “number of seconds of beam-time simulated”, or “number of proton collisions modelled”. Note that this is not the same as the number of actual particles in the MCPL file, which could be either larger or smaller than any particular stat:sum parameter value. But in general the number of particles in the MCPL file is expected to scale roughly linearly with the values of the stat:sum entries – at least in the limit of large statistics.

When inspecting MCPL files, the stat:sum entries will show up as human-readable comments where the encoded value will usually be a human-readable string representation of a non-negative and finite floating point number. In some cases you might also see the special value -1 which means Not Available. Here is how it might look when using the mcpltool to inspect such a file:

  Custom meta data
    Source             : "my_cool_program_name"
    Number of comments : 2
          -> comment 1 : "stat:sum:my_custom_source_stat:                 1.2345e6"
          -> comment 2 : "Some other comment."
    Number of blobs    : 0

Note that depending on your font, you might have to scroll to the right to see the actual value on this website, as for technical reasons values are always padded with spaces to take up exactly 24 characters. Now imagine a second (otherwise compatible) file with:

  Custom meta data
    Source             : "my_cool_program_name"
    Number of comments : 2
          -> comment 1 : "stat:sum:my_custom_source_stat:                    2.0e6"
          -> comment 2 : "Some other comment."
    Number of blobs    : 0

Before the MCPL software release version 2.1.0, the two files would not have been possible to merge with the usual tools (mcpltool --merge or the MCPL C API), but now they are not only mergeable, the values are properly combined via addition to produce the new post-merge value. The resulting file thus ends up with all the particles from both files and the header:

  Custom meta data
    Source             : "my_cool_program_name"
    Number of comments : 2
          -> comment 1 : "stat:sum:my_custom_source_stat:                 3.2345e6"
          -> comment 2 : "Some other comment."
    Number of blobs    : 0

If desired, users or developers interacting with MCPL files through the MCPL software can interact with these stat:sum:<key>:<value> entries programmatically through the C or Python APIs, and likewise the mcpltool --merge and mcpltool --extract modes have been updated to handle stat:sum entries.

It is highly recommended that anyone implementing custom code for merging, splitting, or filtering MCPL files familiarise themselves with the full details of the convention described below.

The syntax

The syntax of the comments defining a statistics with both a key and a value is: "stat:sum:<key>:<value>" with the following additional constraints:

Software support

The MCPL software release supports the stat:sum convention starting with release 2.1.0. Earlier releases will simply treat the stat:sum entries as any other comment in the MCPL header, which will often be OK, but might not be (e.g. when merging files). For consistency, it is recommended that applications modify their software dependency list to require at least version 2.1.0 of the MCPL software release.

When using MCPL release 2.1.0 or later, built-in operations like file merging automatically update stat:sum values when it is unambiguous how they should be updated. In case of ambiguities (e.g. when splitting files), values are set to -1 to be conservative. Additionally, C and Python APIs allow interaction with stat:sum values through keys and values directly, rather than having to manually encode or decode stat:sum:<key>:<value> comments. In the following, the stat:sum support in the MCPL software APIs will be briefly discussed.

Python API

The MCPLFile objects in the Python API now have .stat_sum properties, which are dictionaries of (key,value) pairs:

>>> import mcpl
>>> f = mcpl.MCPLFile('somefile.mcpl')
>>> print( f.stat_sum )
{'my_custom_source_stat': 3.2345e6}

Note that values of -1.0 in the actual stat:sum:<key>:<value> comments in the data, are translated into None in the Python interface.

C API

The C API gains three new functions:

double mcpl_hdr_stat_sum( mcpl_file_t, const char * key );
void mcpl_hdr_add_stat_sum( mcpl_outfile_t, const char * key, double value );
void mcpl_hdr_scale_stat_sums( mcpl_outfile_t, double scale );

The mcpl_hdr_stat_sum function is used to extract stat:sum: values from an existing file, by a particular key, returning -2 in case the key is not present in the file.

The mcpl_hdr_add_stat_sum function is used to add values into files. Of course, this could also be done through careful usage of the mcpl_hdr_add_comment function, but this is not recommended. In particular because the mcpl_hdr_add_stat_sum function can also be called after the first particle has been written to disk, as long as a stat:sum entry with the key in question was also added before the first particle was written. For this initial reservation of space in the on-disk header one can ideally use a value of -1. This is of instance useful if the size of a given Monte Carlo simulation is not initially known – perhaps because of a dynamic criteria for ending the job like “keep running until I have registered a million entries in my detector tally”.

Finally, the function mcpl_hdr_scale_stat_sums is intended for usage by people who might be implementing custom editing, filtering or splitting of files via the C API (normally in conjunction with the mcpl_transfer_metadata function). If simply filtering based upon particle properties (perhaps picking only certain particle types or energies), this is typically not needed. But if somehow splitting a file or otherwise selecting particles based on their position in the file (perhaps producing a file containing the first N particles of another file), the stat:sum values in the new file should probably be reduced somehow, and this can be done with the mcpl_hdr_scale_stat_sums function. If in doubt, invoking mcpl_hdr_scale_stat_sums with a value of -1 ensures that the output file will get a value of -1 for any stat:sum entries, thus avoiding misleading values.

Naturally, other relevant functions in the C API, such as for merging or repairing files, have been updated to handle stat:sum entries appropriately.

Command-line API

As the new stat:sum values are encoded in MCPL comments, it is trivially true that any tool displaying such comments will display the stat:sum values as well. This of course includes the mcpltool and any other code in the MCPL software releases, old or new.

The mcpltool --extract mode has been updated to properly deal with stat:sum entries. Specifically, it will leave them unaltered if only using the -p flag to extract particles of a particular type. However, if using the -l and -s flags to extract particles based on position in the file, the mcpltool code will conservatively set all stat:sum values in the resulting file to -1 (meaning Not Available). This is done since the generic MCPL code can not possibly know with certainty how to calculate the new values.

Finally, other modes, like mcpltool --merge and mcpltool --repair, have also been updated to handle stat:sum entries appropriately.

Guidelines for custom code

Depending on the scenario, custom code modifying MCPL files may or may not have to be updated in order to properly deal with the new stat:sum entries: