Test Data Policies#

Overview#

Each site will require data sets that must be located at each environment location (e.g. summit, TTS, BTS and USDF) that are used to run functionality tests. The data must be managed to ensure disk space remains adequate, as well as organized to a particular test such that it can be easily maintained in the future. Generally, there will be registers which relate tests to datasets, such that if tests are added/removed the data can be modified accordingly. It is anticipated that these registers will eventually be of a computer readable format which directly feeds the data curation processing (e.g. a yaml file), but this has yet to be implemented.

This page explains the policies related to adding and/or removing test data sets. In short, no permanent test data should be added or removed from the summit without first obtaining permission from one of the following people:

  1. Michael Reuter

  2. Tiago Ribeiro

  3. Brian Stalder

  4. Erik Dennihy

These people are tasked with ensuring the request is reasonable and the suggested execution and/or datasets are appropriate. Any datasets/butler collections that have not been cleared by a member of this group and documented accordingly can be deleted at any time, although usually raw and derived data products will be kept for 30 days at the Summit. Generally, disk space at the summit is the most precious, and will be subject to the most scrutiny. Data collections at USDF are very flexible and much more accommodating. The TTS and BTS have space limitations, but remain flexible in nature.

Test Data for Camera Playlists#

The camera playlists utilize raw fits files. To date, these have been managed primarily via confluence pages. To date, only Michael Reuter has been the primary curator of these datasets.

Note that the new images created from running through the playlist are not protected and are subject to deletion when required (generally 30 days).

Test Data Curation in the Butler#

The process to add permanent test data to each environment is currently manual.

Brandon White (from FermiLab) is working on a mechanism that will use Rucio to manage the datasets that have been approved by the personnel above. Until then, Steve Pietrowicz has offered to help add data following the same procedure that has been documented for adding LATISS test collections.

Current Collections#

The list of collections are to be hosted external to this documentation and will evolve with time. Until that list (and mechanism) can be linked from here, a list of the current collections and datasets are on a dedicated confluence page.