# Categorical observation model

Related resources on modeling categorical data in Monolix:

Columns used to define observations: formatting of categorical data in the MonolixSuite.

Categorical data models: examples of categorical data models from the Monolix demos.

On the current page, we show details on the different models for categorical data that can be used in Monolix and their syntax.

## Observation model for categorical ordinal data

*Use of categorical data*

Assume now that the observed data takes its values in a fixed and finite set of nominal categories . Considering the observations for any individual *i* as a sequence of conditionally independent random variables, the model is completely defined by the probability mass functions for and . For a given (i,j), the sum of the **K** probabilities is 1, so in fact only K-1 of them need to be defined. In the most general way possible, any model can be considered so long as it defines a probability distribution, i.e., for each *k*, , and . Ordinal data further assumed that the categories are ordered, i.e., there exists an order such that

We can think, for instance, of levels of pain (low moderate severe) or scores on a discrete scale, e.g., from 1 to 10. Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities for , or in the other direction: for . Any model is possible as long as it defines a probability distribution, i.e., it satisfies

It is possible to introduce dependence between observations from the same individual by assuming that forms a Markov chain. For instance, a Markov chain with memory 1 assumes that all that is required from the past to determine the distribution of * *is the value of the previous observation , i.e., for all ,

*Observation model syntax*

Considering the observations as a sequence of conditionally independent random variables, the model is

again completely defined by the probability mass functions for each category. For a given j, the sum of the K probabilities is 1, so in fact only K-1 of them need to be defined. The distribution of

ordered categorical data can be defined in the block DEFINITION: of the Section [LONGITUDINAL] using either the probability mass functions. Ordinal data further assume that the categories are ordered: . Instead of defining the probabilities of each category, it may be convenient to define the cumulative probabilities for k from 1 to K-1 or the cumulative logits for k from 1 to K-1. An observation variable for ordered categorical

data is defined using the type categorical. Its additional fields are:

categories: List of the available ordered categories. They are represented by increasing successive integers.

P(Y=i): Probability of a given category integer

*i*, for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared.

*Example*

In the proposed example, we use 4 categories and the model is implemented as follows

```
[LONGITUDINAL]
input = {th1, th2, th3}
DEFINITION:
level = {type = categorical, categories = {0, 1, 2, 3},
logit(P(level <=0)) = th1
logit(P(level <=1)) = th1 + th2
logit(P(level <=2)) = th1 + th2 + th3}
```

Using that definition, the distribution associated to the parameters are

Normal for th1. Elsewise, it implies that logit(P(level <=0))>0 and thus that P(level <=0).

Lognormal for th2 and th3 to make sure that P(level <=1)>P(level <=0) and P(level <=2)>=P(level <=1) respectively.

## Observation model for categorical data modeled as a discrete Markov chain

*Use of categorical data modeled as a Markov chain*

In the previous categorical model, the observations were considered as independent for individual i. It is however possible to introduce dependence between observations from the same individual assuming that forms a Markov chain. If observation times are regularly spaced (constant length of time between successive observations), we can consider the observations to be a discrete-time Markov chain.

*Observation model syntax*

An observation variable for ordered categorical data modeled as a discrete Markov chain is defined using the type categorical, along with the dependence definition Markov. Its additional fields are:

categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.

P(Y_1=i): Initial probability of a given category integer i, for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.

P(Y=j|Y_p=i): Probability of transition to a given category integer

*j*from a previous category*i*, for the observation named Y. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are grouped by law of transition for each previous category*i*. Each law of transition provides the various transition probabilities of reaching*j*. They can be provided for events where the reached category*j*is a boundary, instead of an exact match. All boundaries must be of the same kind for a given law. Such an event is denoted by using a comparison operator. When the value of a transition probability can be deduced from others within its law, its definition can be spared.

*Example*

An example where we define an observation model for this case is proposed here

```
[LONGITUDINAL]
input = {a1, a2, a11, a12, a21, a22, a31, a32}
DEFINITION:
State = {type = categorical, categories = {1,2,3}, dependence = Markov
P(State_1=1) = a1
P(State_1=2) = a2
logit(P(State <=1|State_p=1)) = a11
logit(P(State <=2|State_p=1)) = a11+a12
logit(P(State <=1|State_p=2)) = a21
logit(P(State <=2|State_p=2)) = a21+a22
logit(P(State <=1|State_p=3)) = a31
logit(P(State <=2|State_p=3)) = a31+a32}
```

Using that definition, the distribution associated to the parameters are

Logitnormal for a1 and a2 to make sure that the initial probability are well defined.

Normal for a11, a21, and a31 to make sure that the probability is in [0, 1].

Lognormal for a12, a22, and a32 to make sure that the cumulative probability is increasing.

## Observation model for a categorical data modeled as a continuous Markov chain

*Use of categorical data modeled as a continuous Markov chain*

The previous situation can be extended to the case where time intervals between observations are irregular by modeling the sequence of states as a *continuous-time Markov process*. The difference is that rather than transitioning to a new (or the same) state at each time step, the system remains in the current state for some random amount of time before transitioning. This process is now characterized by *transition rates* from the state to the state . Given that the system is in state at time t, the probability to be in state after a small timeis:

The probability that no transition happens between and is

Furthermore, for any individual *i* and time **t**, the transition rates satisfy for any ,

Constructing a model therefore means defining parametric functions of time that satisfy this condition.

*Observation model syntax*

An observation variable for ordered categorical data modeled as a continuous Markov chain is also defined using the type categorical, along with the dependence definition Markov. But here transition rates are defined instead of transition probabilities. Its additional fields are:

categories: List of the available ordered categories. They are represented by increasing successive integers. It is defined right after type.

P(Y_1=i): Initial probability of a given category integer

*i*, for the observation named Y. This probability belongs to the first observed value. A transformed probability can be provided instead of a direct one. The transformation can be log, logit, or probit. The probabilities are defined following the order of their categories. They can be provided for events where the category is a boundary, instead of an exact match. All boundaries must be of the same kind. Such an event is denoted by using a comparison operator. When the value of a probability can be deduced from others, its definition can be spared. The initial probabilities are optional as a whole, and the default initial law is uniform.transitionRate(i,j): Transition rate departing from a given category integer

*i*and arriving to a category*j*. They are grouped by law of transition for each departure category*i*. One definition of transition rate can be spared by law of transition, as they must sum to zero.

*Example*

An example where we define an observation model for this case is proposed here

```
[LONGITUDINAL]
input={p1, q12, q21}
DEFINITION:
State = {type = categorical, categories = {1,2}, dependence = Markov
P(State_1=1) = p1
transitionRate(1,2) = q12
transitionRate(2,1) = q21}
```