Predicting model visibilities with crystalball

Crystalball is a python package that uses dask to accelerate the prediction of model visibilities. Since it is a well supported and typed python module it is listed as a flint dependency.

Model specification

The model predicted is described by a text file with a BBS style source representation. It is this format that is used by the wsclean --save-source-list option. A complete descip[tion of the format is available at the above link. A subset example of the format is below (purely for clarity):

Format = Name, Type, Ra, Dec, I, SpectralIndex, LogarithmicSI, ReferenceFrequency='743990740.7407408', MajorAxis, MinorAxis, Orientation
174748-312315,GAUSSIAN,17:47:48.619992,-31.23.15.20016,0.19411106571231185,[-3.806350913533059,-4.135453173684361],true,743990740.7407408,68.5999984741211,68.5999984741211,59.79999923706055,

crystalball is designed specifically to use the model created by wsclean -save-source-list, which uses this bbs model style.

Implementation details

crystalball uses dask to accelerate the prediction of the model visibilities. Under the hood dask-ms reads in chunks of a measurement set appropriately sized for the available memory pool. Subsequently the prediction process may be spread across CPUs using the chunked dask mappings. The abstraction model used by dask also is easily parallelisable across nodes provided an appropriate dask cluster has been configured. Thankfully, this is the case in how prefect is being used.

The more cores (or dask workers) available in the dask cluster the faster the model prediction will be. The individual compute resources can be small (e.g. 2 CPUs and 8GB per dask-worker) but through extreme horizontal scale (upwards of 1000 dask-worker instances) the model prediction with crystalball can be quicker than addmodel. On systems such as SLURM this resource configuration may be very easy to schedule.

Stability

It has been noted that under some conditions it is possible for the dask.distributed scheular managing the crystalball prediction can stall, wherein all processing essentially stops. The work can be resumed after the schedular recognises a worker has died, which when running on SLURM often happens when the job lifetime is reached.

Through experimentation we found the following dask configuration settings to be useful:


These improve the stability of the distributed dask cluster, particularly around

the usage of crystalball prediction

dask.config.set({“distributed.comm.retry.count”: 20}) dask.config.set({“distributed.comm.timeouts.connect”: 30}) dask.config.set({“distributed.worker.memory.terminate”: False})


These are set when using the `subtract_cube_pipeline`, but are not ;averaged for the `flint_crystalball` program. In this instance consider setting these through the environment (see Dask documentation).

## Accessing via the CLI

The primary entry point for the visibility prediction with `addmodel` in `flint` is with `flint_addmodel`:

```{argparse}
:ref: flint.predict.crystalball.get_parser
:prog: flint_crystalball