# Calculation rules

## Detection of design

PKanalix detects the design automatically based on the dataset columns. If no OCCASION column is present, the design is detected as “parallel.” On the other hand, if one or several columns have been tagged as OCCASION, the design is detected as “crossover” (repeated or non-repeated). The detected design is displayed in the bioequivalence settings for the user's information and cannot be changed. Depending on the detected design, the factors selected by default for the linear model are different (see below).

## Linear model

The linear model can only include fixed effects, as recommended by the FDA for parallel and non-repeated crossover design (see *Statistical Approaches to Establishing Bioequivalence*, page 10) and by the EMA for parallel, non-repeated crossover and repeated crossover designs (see *Guideline On The Investigation Of Bioequivalence*, page 15). In addition, “id” is automatically considered as nested in “sequence” and no additional nesting can be defined. Interaction terms and random effects are not supported.

According to the regulatory guidelines, the default models are:

**parallel design**:

which can also be written as

in the case of two formulations “ref” and “test” with *ε as *a normal random variable (i.e., the residual error).

**crossover design**:

which can also be written as

in the case of multiple individuals (id1, id2, etc.), two sequences (“RT” and “TR”), two periods (“1” and “2”) and two formulations (“ref” and “test”).

The model parameters are calculated using QR factorization.

The parameters are the coefficients. They are saved in the output file **estimatedCoefficients_XXX.txt**. The coefficient of interest representing the formulation effect is , which is also called the point estimate.

## Difference and ratio

In the Results > Confidence intervals table, the column “Difference” corresponds to the point estimate(see above). The column “Ratio” depends on the parameter log-transformation choice:

without log-transformation: whereis the least square mean (also called adjusted mean, see below)

with log-transformation:

## Confidence intervals

The confidence interval is first calculated for both the difference () and the ratio ().

with the point estimate (“Difference” column, see above),the quantiles of a Student t-distribution at the level and degrees of freedom, and the standard error of the point estimate.

In case of a parallel design with the** Welch-Satterthwaite correction** (see Bioequivalence settings), the formula is the following:

whereis the samples standard deviation of the individuals and is the number of individuals having received formulation . A correction is also applied to the degrees of freedom, which are calculated as:

The confidence interval for the ratio is then calculated using the following formula:

without log-transformation: whereis the least square mean for the reference formulation (see below)

with log-transformation:

## Adjusted means (least square means)

With the model NCAparam ~ SEQ + ID + PERIOD + FORM, one can calculate the expected value (mathematical expectation) for any value of SEQ, ID, PERIOD and FORM. The **least square mean (LSM)** for the reference represents the expected value for the reference formulation leaving the value for the other factors undefined. To calculate it, we average the model-predicted values across the levels of ID, PERIOD and SEQ.

The weights assigned to each combination of levels depend if the factors are nested or not. A factor A is said nested in factor B if each level of A appears in only one level of B. By design, in crossover bioequivalence studies, ID is nested in SEQUENCE (i.e., each individual belongs to only one sequence). Thus, in PKanalix, when ID and SEQUENCE are defined as factors in the linear model, **we assume that ID is nested in SEQUENCE when calculating the least square means**. No further nesting is assumed, nor can be specified. Note that the nesting definition only affects the adjusted means and not the point estimate (ratio) and its confidence interval.

## Excluded individuals

**In the case of a parallel design**, individuals for which the NCA parameter cannot be calculated are excluded.

**In case of a crossover design, incomplete individuals are excluded**. Incomplete individual are those that do not have a value for each period, either because these were no concentration data for this period (period missing in the data set) or because the computed NCA parameter could not be calculated for this period (e.g., because the data were insufficient to calculate the terminal slope). In the case of a **non-repeated crossover design**, whether the individuals with missing data is excluded or not has no impact on the calculation of the point estimate and confidence interval. In the case of a **repeated crossover design** (i.e., when the individuals receive the same formulation multiple times), excluding the individuals for which the NCA parameter for one or more periods is missing has an impact on the point estimate calculation and confidence interval. The common practice is to exclude incomplete individuals (see also the EMA *Guideline On The Investigation Of Bioequivalence*, page 14).

Which individuals are excluded is calculated for each NCA parameter separately, as some individuals may have values for all periods for Cmax but not AUCINF_obs for instance.

In the Results > Confidence intervals table, the number of individuals “N” does not count the excluded individuals. Thus, in case of a crossover design, the number of individuals contributing to ref and to test is the same. **In the BE plots, only the individuals included in the bioequivalence analysis are shown**.

## ANOVA

The sum of squares presented in the ANOVA table of the Results tab are** type-I sequential sum of squares**. In case of an unbalanced design (i.e., different numbers of individuals receiving RT versus TR), the type-I sum of squares depends on the order of the included factors. In PKanalix, **the enforced order is SEQUENCE + ID + PERIOD + FORMULATION** (+ ADDITIONAL), following the SAS example code provided by the FDA (see *Statistical Approaches to Establishing Bioequivalence, 2001, Appendix E*) and EMA (see *Questions & Answers: positions on specific questions addressed to the Pharmacokinetics Working Party, 2015**, question 8*).

## Coefficient of variation

In the Results > Coefficients of variation table, SD corresponds to the standard deviation of the residuals, i.e., to the standard deviation of the normal random variable *ε* in the linear model (for the typical model with a parallel design for instance).

The coefficient of variation is then calculated as:

without log-transformation: (whereis the least square mean for the reference formulation, see above)

with log-transformation: