The first step is to identify which in which part of FDR your performance problems are occurring. There are three main stages that FDR goes through when verifying any property (refinement, deadlock etc.):
If evaluation is the problem it generally indicates that the CSPm you have
constructed is too complex for FDR to efficiently represent. Firstly, it is
worth checking that all processes are finite state, since any infinite state
process (such as the standard CSP processes COUNT(n)
) will cause FDR to spin
on this stage. The other common reason is using sets that are too large for
channels or datatypes. For example, a channel such as channel c : Int
means
that FDR has to expore \(2^{32}\) different branches whenever it sees a
statement of the form c?x
. Using small finite sets is critical to ensure
that FDR can efficiently evaluate the model.
Note that results about infinite types can sometimes be established by small finite state checks by using the theory of data independence. See, for example, Understanding Concurrent Systems for further details.
Optimising compilation is difficult since, generally, the causes of this are difficult to determine. One common cause is using too many events, so reducing the size of channel types as far as possible can help. Further, if too much renaming is be performed (particularly renaming a single event to many different copies of it), this can also cause performance to decrease.
Another way to improve performance is by using a carefully placed application of
explicate
: generally the best place to use it is in an argument P
of
a parallel composition, where P
has a very complex definition, but actually
has a relatively small state space (at most a million states or so). Also, it
can be worth experimenting by disabling
compiler.recursive_high_level
as this can occasionally cause an
increase in compilation time.
If the bottleneck is in the checking stage, there are essentially two different mitigations: either more computing resources can be directed at the problem, or the model can be optimised to attempt to reduce its size, or increase the speed, at which it can be verified. Often a combination of both of these is required. We review both of these techniques below, in turn.
There are several ways in which FDR can make use of more computing: a larger machine can be utilised, on-disk storage can be used, or a cluster of computers can be utilised. These are all reviewed below.
Since FDR is able to linearly scale to machines with many cores (we have tested with machines of up to 40 cores) on many problems, using a larger machine is the first step. Note that it is also possible to rent such machines from providers such as Amazon Web Services, or Google Compute Cloud for a small amount each hour.
If the problem is lack of memory, FDR is also able to make use of on-disk storage (although we would only recommend SSDs, particularly on large machine). We have been able to verify problems where the on-disk storage is a factor of 4 larger than the amount of RAM without a noticeable slowdown. In order for this to be effective, the machine needs to be configured correctly. In general, if the machine in question has multiple drives, we strongly recommend the use of no RAID (i.e. each drive is independently mounted by the operating system). Further, numerous small drives are better than a small number of large drives.
Once the system has been properly configured with disk storage, FDR can be
configured be setting the option: refinement.storage.file.path
to
a comma separated list of paths. For example, passing the command line flag
--refinement-storage-file-path=/scratch/disk0/,/scratch/disk1/,/scratch/disk2/
to the command line version will cause FDR to store temporary data in the three
different directories. By default, FDR also uses 90% of the free memory at the
time the check starts to store data in-memory. This value can be changed (see
refinement.storage.file.cache_size
), but generally should not need
to be.
The other solution is to use FDR on a cluster instead. For a wide variety of problems, FDR is capable of scaling linearly as the cluster size increases providing sufficient network bandwidth is available (we achieved linear scaling on a cluster of 64 computes connected by a full-bisection 10 gigabit network on Amazon’s EC2 compute cloud). Full details on how to use the cluster version are in Using a Cluster.
Firstly, check to see if the semantic model that the check is being performed in is necessary. Checks in the failures-divergences model are slower than those in the failures model, which are in turn slower than those in the traces model. Further, the failures-divergences model is actually particularly slow and does not scale as well as the number of cores increases (which will reduce the benefit of more computing power). If it is possible to statically determine that divergence is impossible, or if divergence is not relevant to the property in question, then a potentially large performance boost can be obtained by moving to the failures or traces model. Clearly, care must be taken when choosing a different denotational model, since this clearly remove violations of the specification.
The most powerful technique that FDR has for optimising models is compression. Compression takes a LTS and returns an equivalent LTS (according to the relevant denotational model) that, hopefully has fewer states. Compression is a complex technique to utilise effectively, but is by far the most powerful method of reducing the number of states that have to be verified. Compression is described in Compression.
Another possibility for optimising models is to use
partial order reduction.
This attempts to remove redundant reoderings of events (for example, if a
process can perform both a
and b
in any order, then partial-order
reduction would only consider one ordering). Thus, this is similar to
compression, but can be applied much more easily. Equally, partial-order
reduction is unable to reduce some systems that compression can reduce. Since it
is trivial to enable partial-order reduction (see the instructions at
partial order reduction), it is often worth
enabling it just to see if it helps.
Lastly, since CSP is compositional, it may sometimes be possible to decompose a check of a large system into a series of smaller checks. In particular, CSP is compositional in the sense that if \(P \refinedby Q\), then for any CSP context \(C\), \(C[P] \refinedby C[Q]\). For example, suppose a system contained a complex model of a finite state buffer. Separately, a verification could be done that showed that the complex finite state buffer satisfies the standard CSP finite-state buffer specification. Then, the original system could be verified with the standard CSP finite-state buffer specification taking the place of the complex buffer. This should reduce the number of states that need to be verified.