# Predicting model visibilities with `crystalball` `Crystalball` [is a python package](https://github.com/caracal-pipeline/crystalball) that uses `dask` to accelerate the prediction of model visibilities. Since it is a well supported and typed `python` module it is listed as a `flint` dependency. ## Model specification The model predicted is described by [a text file with a BBS style source representation](https://support.astron.nl/LOFARImagingCookbook/bbs.html). It is this format that is used by the `wsclean --save-source-list` option. A complete descip[tion of the format is available at the above link. A subset example of the format is below (purely for clarity): ```bash Format = Name, Type, Ra, Dec, I, SpectralIndex, LogarithmicSI, ReferenceFrequency='743990740.7407408', MajorAxis, MinorAxis, Orientation 174748-312315,GAUSSIAN,17:47:48.619992,-31.23.15.20016,0.19411106571231185,[-3.806350913533059,-4.135453173684361],true,743990740.7407408,68.5999984741211,68.5999984741211,59.79999923706055, ``` `crystalball` is designed specifically to use the model created by `wsclean -save-source-list`, which uses this `bbs` model style. ## Implementation details `crystalball` uses `dask` to accelerate the prediction of the model visibilities. Under the hood `dask-ms` reads in chunks of a measurement set appropriately sized for the available memory pool. Subsequently the prediction process may be spread across CPUs using the chunked `dask` mappings. The abstraction model used by `dask` also is easily parallelisable across nodes provided an appropriate `dask` cluster has been configured. Thankfully, this is the case in how `prefect` is being used. The more cores (or dask workers) available in the `dask` cluster the faster the model prediction will be. The individual compute resources can be small (e.g. 2 CPUs and 8GB per `dask-worker`) but through extreme horizontal scale (upwards of 1000 `dask-worker` instances) the model prediction with `crystalball` can be quicker than `addmodel`. On systems such as `SLURM` this resource configuration may be very easy to schedule. ## Stability It has been noted that under some conditions it is possible for the `dask.distributed` scheular managing the `crystalball` prediction can stall, wherein all processing essentially stops. The work can be resumed after the schedular recognises a worker has died, which when running on `SLURM` often happens when the job lifetime is reached. Through experimentation we found the following `dask` configuration settings to be useful: ``` ``` # These improve the stability of the distributed dask cluster, particularly around # the usage of crystalball prediction dask.config.set({"distributed.comm.retry.count": 20}) dask.config.set({"distributed.comm.timeouts.connect": 30}) dask.config.set({"distributed.worker.memory.terminate": False}) ``` These are set when using the `subtract_cube_pipeline`, but are not ;averaged for the `flint_crystalball` program. In this instance consider setting these through the environment (see Dask documentation). ## Accessing via the CLI The primary entry point for the visibility prediction with `addmodel` in `flint` is with `flint_addmodel`: ```{argparse} :ref: flint.predict.crystalball.get_parser :prog: flint_crystalball ```