Count observation model
Related resources on modeling count data in Monolix:
Columns used to define observations: formatting of count data in the MonolixSuite.
Count data model library : detailed description of the library of count models integrated within Monolix.
Count data models: examples of count data models from the Monolix demos.
On the current page, we explain the different models for count data that can be used in Monolix and their syntax.
Use of count data
Longitudinal count data is a special type of longitudinal data that can take only nonnegative integer values {0, 1, 2, …} that come from counting something, e.g., the number of seizures, hemorrhages or lesions in each given time period. In this context, data from individual i is the sequence where is the number of events observed in the jth time interval .
Count data models can also be used for modeling other types of data such as the number of trials required for completing a given task or the number of successes (or failures) during some exercise. Here, is either the number of trials or successes (or failures) for subject i at time . For any of these data types we will then model as a sequence of random variables that take their values in {0, 1, 2, …}. If we assume that they are independent, then the model is completely defined by the probability mass functions for and . Here, we will consider only parametric distributions for count data.
Observation model syntax
Considering the observations as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions . An observation variable for count data, with name Y
for instance, is defined using the following syntax:
DEFINITION:
Y = {type=count, P(Y=k) = ...}
type=count: indicates the data type
P(Y=k): probability of a given count value
k
, for the observation namedY
. The observation name is free but must be the same at the beginning of the line and for the probability definition.k
is a reserved keyword and represents a positive integer. k supersedes in this scope any variable k defined previously. The probability must be in [0,1].
A transformed probability can also be provided. The transformation can be log, logit, or probit. For instance with a log-transformation:
DEFINITION:
Y = {type=count, log(P(Y=k)) = ...}
As k
is only recognized within the probability definition, it is not possible to define the probability using k
in an EQUATION block above. However, it is possible to use if/else statements within the probability definition:
DEFINITION:
Y = {type=count,
if k==0
Pk = ...
else
Pk = ...
end
P(Y=k) = Pk}
Common mathematical functions to define count distributions are factorial(a)
, factln(a)
(logarithm of factorial) and gammaln(a)
(logarithm of gamma function). They can be used with a
any positive numerical value (not only integers). Note that factorials grow very rapidly and can be considered as “+infinity” in a computer, even when the probability is defined as a ratio of two factorials which stays with reasonable values on paper. It is thus convenient to works with logarithms of factorials, which grow much slower (see examples).
Examples
Example 1: Poisson distribution with time evolution
In this example, the Poisson distribution is used for defining the distribution of :
where the Poisson intensity is function of time . This model is implemented as follows
[LONGITUDINAL]
input = {a,b}
EQUATION:
lambda = a+b*t
DEFINITION:
y = {type=count, P(y=k) = exp(-lambda)*(lambda^k)/factorial(k)}
Example 2: Binomial distribution
We consider n
Bernouilli trials, each having a probability of success p
. The probability of having k successes is:
To avoid that be so large that it will be considered as NaN by a computer, it is good practice to define the log of the probability to convert the ratios of large number into a sum of smaller numbers:
The corresponding Mlxtran model is:
[LONGITUDINAL]
input = {n, p}
DEFINITION:
CountNumber = {type=count, log(P(CountNumber=k)) = gammaln(n+1) - factln(k) - gammaln(n-k+1) + k*log(p) + (n-k)*log(1-p)}
OUTPUT:
output = Y
Example 3: Poisson distribution with zero inflation
Zero-inflations can be encoded using if/else statements:
[LONGITUDINAL]
input = {lambda, f}
DEFINITION:
CountNumber = {type=count,
if k==0
Pk = exp(-lambda)*(1-f) + f
else
Pk = exp(k*log(lambda) - lambda - factln(k))*(1-f)
end
P(CountNumber=k) = Pk}
OUTPUT:
output = CountNumber
Library of count models
The MonolixSuite library of models includes many pre-written count data models: Count library.