Identifying the Problem

The first step is to identify which in which part of FDR your performance problems are occurring. There are three main stages that FDR goes through when verifying any property (refinement, deadlock etc.):

  1. Evaluation of the CSPM processes. During Evaluation, FDR evaluates the CSPM into a pure CSP process. If this stage is the bottleneck, the graphical user interface will show a task titled “Evaluating process” taking a long time to complete (see The Task List for details on where to locate the task list). See Optimising Evaluation for tips on how to optimise this stage.
  2. Compilation of the pure CSP process. During compilation, FDR converts the CSP processes into labelled-transition systems (LTSs) and will be indicated by a task named “Constructing machines” in the task list. See Optimising Compilation for tips on how to optimise this stage.
  3. Checking of the compiled LTS. During this stage FDR is actually verifying the property in question. This is indicated by a task named “Checking Refinement”. See Optimising Checking for tips on how to optimise this stage.

Optimising Evaluation

If evaluation is the problem it generally indicates that the CSPm you have constructed is too complex for FDR to efficiently represent. Firstly, it is worth checking that all processes are finite state, since any infinite state process (such as the standard CSP processes COUNT(n)) will cause FDR to spin on this stage. The other common reason is using sets that are too large for channels or datatypes. For example, a channel such as channel c : Int means that FDR has to expore \(2^{32}\) different branches whenever it sees a statement of the form c?x. Using small finite sets is critical to ensure that FDR can efficiently evaluate the model.

Note that results about infinite types can sometimes be established by small finite state checks by using the theory of data independence. See, for example, Understanding Concurrent Systems for further details.

Optimising Compilation

Optimising compilation is difficult since, generally, the causes of this are difficult to determine. One common cause is using too many events, so reducing the size of channel types as far as possible can help. Further, if too much renaming is be performed (particularly renaming a single event to many different copies of it), this can also cause performance to decrease.

Another way to improve performance is by using a carefully placed application of explicate: generally the best place to use it is in an argument P of a parallel composition, where P has a very complex definition, but actually has a relatively small state space (at most a million states or so). Also, it can be worth experimenting by disabling compiler.recursive_high_level as this can occasionally cause an increase in compilation time.

Optimising Checking

If the bottleneck is in the checking stage, there are essentially two different mitigations: either more computing resources can be directed at the problem, or the model can be optimised to attempt to reduce its size, or increase the speed, at which it can be verified. Often a combination of both of these is required. We review both of these techniques below, in turn.

Using more Computing Resources

There are several ways in which FDR can make use of more computing: a larger machine can be utilised, on-disk storage can be used, or a cluster of computers can be utilised. These are all reviewed below.

Since FDR is able to linearly scale to machines with many cores (we have tested with machines of up to 40 cores) on many problems, using a larger machine is the first step. Note that it is also possible to rent such machines from providers such as Amazon Web Services, or Google Compute Cloud for a small amount each hour.

If the problem is lack of memory, FDR is also able to make use of on-disk storage (although we would only recommend SSDs, particularly on large machine). We have been able to verify problems where the on-disk storage is a factor of 4 larger than the amount of RAM without a noticeable slowdown. In order for this to be effective, the machine needs to be configured correctly. In general, if the machine in question has multiple drives, we strongly recommend the use of no RAID (i.e. each drive is independently mounted by the operating system). Further, numerous small drives are better than a small number of large drives.

Once the system has been properly configured with disk storage, FDR can be configured be setting the option: refinement.storage.file.path to a comma separated list of paths. For example, passing the command line flag --refinement-storage-file-path=/scratch/disk0/,/scratch/disk1/,/scratch/disk2/ to the command line version will cause FDR to store temporary data in the three different directories. By default, FDR also uses 90% of the free memory at the time the check starts to store data in-memory. This value can be changed (see refinement.storage.file.cache_size), but generally should not need to be.

The other solution is to use FDR on a cluster instead. For a wide variety of problems, FDR is capable of scaling linearly as the cluster size increases providing sufficient network bandwidth is available (we achieved linear scaling on a cluster of 64 computes connected by a full-bisection 10 gigabit network on Amazon’s EC2 compute cloud). Full details on how to use the cluster version are in Using a Cluster.

Optimising the Model

Firstly, check to see if the semantic model that the check is being performed in is necessary. Checks in the failures-divergences model are slower than those in the failures model, which are in turn slower than those in the traces model. Further, the failures-divergences model is actually particularly slow and does not scale as well as the number of cores increases (which will reduce the benefit of more computing power). If it is possible to statically determine that divergence is impossible, or if divergence is not relevant to the property in question, then a potentially large performance boost can be obtained by moving to the failures or traces model. Clearly, care must be taken when choosing a different denotational model, since this clearly remove violations of the specification.

The most powerful technique that FDR has for optimising models is compression. Compression takes a LTS and returns an equivalent LTS (according to the relevant denotational model) that, hopefully has fewer states. Compression is a complex technique to utilise effectively, but is by far the most powerful method of reducing the number of states that have to be verified. Compression is described in Compression.

Another possibility for optimising models is to use partial order reduction. This attempts to remove redundant reoderings of events (for example, if a process can perform both a and b in any order, then partial-order reduction would only consider one ordering). Thus, this is similar to compression, but can be applied much more easily. Equally, partial-order reduction is unable to reduce some systems that compression can reduce. Since it is trivial to enable partial-order reduction (see the instructions at partial order reduction), it is often worth enabling it just to see if it helps.

Lastly, since CSP is compositional, it may sometimes be possible to decompose a check of a large system into a series of smaller checks. In particular, CSP is compositional in the sense that if \(P \refinedby Q\), then for any CSP context \(C\), \(C[P] \refinedby C[Q]\). For example, suppose a system contained a complex model of a finite state buffer. Separately, a verification could be done that showed that the complex finite state buffer satisfies the standard CSP finite-state buffer specification. Then, the original system could be verified with the standard CSP finite-state buffer specification taking the place of the complex buffer. This should reduce the number of states that need to be verified.