The package `FENmlm`

estimates maximum likelihood (ML)
models with fixed-effects. The function `femlm`

is the
workhorse of the package: it performs efficient ML estimations with
*any number* of fixed-effects and also allows for non-linear in
parameters right-hand-sides. Four likelihood models are supported:
Poisson, Negative Binomial, Gaussian (equivalent to OLS) and Logit.

The standard-errors of the estimates can be very easily clustered (up to four-way).

Two specific functions are implemented to seamlessly export the
results of multiple estimations into either a data.frame (function
`res2table`

), or a Latex table of “article-like” quality
(function `res2tex`

).

The main features of the package are illustrated in this vignette.
For the theory behind this method, see Berge (2018), “*Efficient
estimation of maximum likelihood models with multiple fixed-effects: the
R package FENmlm.*” CREA Discussion Papers, 13 (https://github.com/lrberge/fixest/blob/master/_DOCS/FENmlm_paper.pdf).

**UPDATE: this package is deprecated, please use fixest now.**

This example deals with international trade, which is a setup that usually requires performing estimations with many fixed-effects. We estimate a very simple gravity model in which we are interested in finding out the negative effect of geographic distance on trade. The sample data consists of European trade extracted from Eurostat. Let’s load the data contained in the package:

```
library(FENmlm)
#> The package 'FENmlm' is not maintained any more. Please use package 'fixest' instead (which greatly extends its functionalities).
data(trade)
```

This data is a sample of bilateral importations between EU15 countries from 2007 and 2016. The data is further broken down according to 20 product categories. Here is a sample of the data:

Destination | Origin | Product | Year | dist_km | Euros |
---|---|---|---|---|---|

LU | BE | 1 | 2007 | 139.6199 | 2.966697 |

BE | LU | 1 | 2007 | 139.6199 | 6.755030 |

LU | BE | 2 | 2007 | 139.6199 | 57.078782 |

BE | LU | 2 | 2007 | 139.6199 | 7.117406 |

LU | BE | 3 | 2007 | 139.6199 | 17.379821 |

BE | LU | 3 | 2007 | 139.6199 | 2.622254 |

The dependent variable of the estimation will be the level of trade between two countries while the independent variable is the geographic distance between the two countries. To obtain the elasticity of geographic distance net of the effects of the four clusters, we estimate the following:

\(E\left(Trade_{i,j,p,t}\right)=\gamma_{i}^{Exporter}\times\gamma_{j}^{Importer}\times\gamma_{p}^{Product}\times\gamma_{t}^{Year}\times Distance_{ij}^{\beta}\),

where the subscripts \(i\), \(j\), \(p\) and \(t\) stand respectively for the exporting country, the importing country, the type of product and the year, and the \(\gamma_{v}^{c}\) are fixed-effects for these groups. Here \(\beta\) is the elasticity of interest.

Note that when you use the Poisson/Negative Binomial families, this
relationship is in fact linear because the right hand side is
exponentialized to avoid negative values for the Poisson parameter. This
leads to the equivalent relation:^{1}

\(E\left(Trade_{i,j,p,t}\right)=\exp\left(\gamma_{i}^{Exporter}+\gamma_{j}^{Importer}+\gamma_{p}^{Product}+\gamma_{t}^{Year}+\beta\times \ln Distance_{ij}\right)\).

The estimation of this model using a Poisson likelihood is as follows:

`gravity_results <- femlm(Euros ~ log(dist_km), trade, family = "poisson", cluster = c("Origin", "Destination", "Product", "Year"))`

Note that alternatively you could have done the same estimation
without the argument `cluster`

, by inserting the
fixed-effects directly in the formula with a pipe. Further, you can
avoid the argument `family`

since the Poisson model is the
default:

`gravity_results <- femlm(Euros ~ log(dist_km)|Origin+Destination+Product+Year, trade)`

The results can be shown directly with the `print`

method:

```
print(gravity_results)
#> ML estimation, family = Poisson, Dep. Var.: Euros
#> Observations: 38,325
#> Cluster sizes: Origin: 15, Destination: 15, Product: 20, Year: 10
#> Standard-errors type: Standard
#> Estimate Std. Error z value Pr(>|z|)
#> log(dist_km) -1.5276 0.001925 -793.65 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> BIC: -768,493.77 Pseudo-R2: 0.74743
#> Log-likelihood: 1,538,211.80 Squared Cor.: 0.61201
```

The `print`

reports the coefficient estimates and
standard-errors as well as some other information. Among the quality of
fit information, the squared-correlation corresponds to the correlation
between the dependent variable and the expected predictor; it reflects
somehow to the idea of R-square in OLS estimations.

The last line reports the convergence status of the optimization
algorithm (the function `nlminb`

from the package
`stats`

). In the, rare, event of no convergence, see the
troubleshooting section of this Vignette to explore the possible causes
and solutions.

To cluster the standard-errors, we can simply use the argument
`se`

of the `summary`

method. Let’s say we want to
cluster the standard-errors according to the first two clusters
(i.e. the *Origin* and *Destination* variables). Then we
just have to do:

```
summary(gravity_results, se = "twoway")
#> ML estimation, family = Poisson, Dep. Var.: Euros
#> Observations: 38,325
#> Cluster sizes: Origin: 15, Destination: 15, Product: 20, Year: 10
#> Standard-errors type: Two-way (Origin & Destination)
#> Estimate Std. Error z value Pr(>|z|)
#> log(dist_km) -1.5276 0.12627 -12.098 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> BIC: -768,493.77 Pseudo-R2: 0.74743
#> Log-likelihood: 1,538,211.80 Squared Cor.: 0.61201
```

The clustering can be done on one (`se="cluster"`

), two
(`se="twoway"`

), three (`se="threeway"`

) or up to
four (`se="fourway"`

) variables. If the estimation includes
fixed-effects, then by default the clustering will be done using these
fixed-effects, in the original order. This is why the *Origin*
and *Destination* variables were used for the two-way clustering
in the previous example. If, instead, you wanted to perform one-way
clustering on the *Product* variable, you need to use the
argument `cluster`

:

```
# Equivalent ways of clustering the SEs:
# - using the vector:
summary(gravity_results, se = "cluster", cluster = trade$Product)
#> ML estimation, family = Poisson, Dep. Var.: Euros
#> Observations: 38,325
#> Cluster sizes: Origin: 15, Destination: 15, Product: 20, Year: 10
#> Standard-errors type: Clustered
#> Estimate Std. Error z value Pr(>|z|)
#> log(dist_km) -1.5276 0.095772 -15.951 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> BIC: -768,493.77 Pseudo-R2: 0.74743
#> Log-likelihood: 1,538,211.80 Squared Cor.: 0.61201
# - by reference (only possible because Product has been used as a cluster in the estimation):
summary(gravity_results, se = "cluster", cluster = "Product")
#> ML estimation, family = Poisson, Dep. Var.: Euros
#> Observations: 38,325
#> Cluster sizes: Origin: 15, Destination: 15, Product: 20, Year: 10
#> Standard-errors type: Clustered (Product)
#> Estimate Std. Error z value Pr(>|z|)
#> log(dist_km) -1.5276 0.095772 -15.951 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> BIC: -768,493.77 Pseudo-R2: 0.74743
#> Log-likelihood: 1,538,211.80 Squared Cor.: 0.61201
```

Note that you can always cluster the standard-errors, even when the
estimation contained no fixed-effect. Buth then you need to use the
argument `cluster`

and give explicitely the data to cluster
with:

```
gravity_simple = femlm(Euros ~ log(dist_km), trade)
summary(gravity_simple, se = "twoway", cluster = trade[, c("Origin", "Destination")])
#> ML estimation, family = Poisson, Dep. Var.: Euros
#> Observations: 38,325
#> Standard-errors type: Two-way
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 10.8940 1.086600 10.0250 < 2.2e-16 ***
#> log(dist_km) -1.0289 0.152658 -6.7401 1.58e-11 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> BIC: -2,492,169.10 Pseudo-R2: 0.18101
#> Log-likelihood: 4,984,380.42 Squared Cor.: 0.05511
```

To use other likelihood models, you simply have to use the argument
`family`

. The fixed-effects Negative Binomial estimation
is:

`gravity_results_negbin <- femlm(Euros ~ log(dist_km)|Origin+Destination+Product+Year, trade, family = "negbin")`

Now to estimate the same relationship with the Gaussian likelihood, we need to put the left hand side in logarithm (since the right-hand-side is not exponentialized):

`gravity_results_gaussian <- femlm(log(Euros) ~ log(dist_km)|Origin+Destination+Product+Year, trade, family = "gaussian")`

Note that the estimates with the Gaussian likelihood are equivalent to the estimates with Ordinary Least Squares.

Now let’s say that we want a compact overview of the results of
several estimations. The best way is to use the function
`res2table`

. This function summarizes the results of several
`femlm`

estimations into a data.frame. To see the
fixed-effects results with the three different likelihoods, we just have
to type:

`res2table(gravity_results, gravity_results_negbin, gravity_results_gaussian, se = "twoway", subtitles = c("Poisson", "Negative Binomial", "Gaussian"))`

Poisson | Negative Binomial | Gaussian | |
---|---|---|---|

Dependent Var.: | Euros | Euros | log(Euros) |

log(dist_km) | -1.5276*** (0.1263) | -1.7276*** (0.1431) | -2.1697*** (0.1655) |

Overdispersion: | 0.845552 | ||

Fixed-Effects: | ——————- | ——————- | ——————- |

Origin | Yes | Yes | Yes |

Destination | Yes | Yes | Yes |

Product | Yes | Yes | Yes |

Year | Yes | Yes | Yes |

___________________ | ___________________ | ___________________ | ___________________ |

Family: | Poisson | Neg. Bin. | Gaussian |

Observations | 38,325 | 38,325 | 38,325 |

Squared-Correlation | 0.612 | 0.474 | 0.706 |

Adj-pseudo R2 | 0.74743 | 0.1622 | 0.23583 |

Log-Likelihood | -768,493.77 | -130,359.17 | -75,682.39 |

BIC | 1,537,599.67 | 261,341.03 | 151,976.92 |

We added the argument `se="twoway"`

to cluster the
standard-errors for all estimations. As can be seen this function gives
an overview of the estimates and standard-errors, as well as some
quality of fit measures. The argument `subtitles`

is used to
add information on each estimation column.

In the previous example, we directly added the estimation results as
arguments of the function `res2table`

. But the function also
accepts lists of estimations. Let’s give an example. Say you want to see
the influence of the introduction of fixed-effects on the estimate of
the elasticity of distance. You can do it with the following code:

```
gravity_subcluster = list()
all_clusters = c("Year", "Destination", "Origin", "Product")
for(i in 1:4){
gravity_subcluster[[i]] = femlm(Euros ~ log(dist_km), trade, cluster = all_clusters[1:i])
}
```

The previous code performs 4 estimations with an increasing number of
fixed-effects and store their results into the list named
`gravity_subcluster`

. To show the results of all 4
estimations, it’s easy:

`res2table(gravity_subcluster, se = "twoway", cluster = trade[, c("Origin", "Destination")])`

model 1 | model 2 | model 3 | model 4 | |
---|---|---|---|---|

log(dist_km) | -1.0293*** (0.1527) | -1.2256*** (0.1975) | -1.5173*** (0.1238) | -1.5276*** (0.1263) |

Fixed-Effects: | ——————- | ——————- | ——————- | ——————- |

Year | Yes | Yes | Yes | Yes |

Destination | No | Yes | Yes | Yes |

Origin | No | No | Yes | Yes |

Product | No | No | No | Yes |

___________________ | ___________________ | ___________________ | ___________________ | ___________________ |

Observations | 38,325 | 38,325 | 38,325 | 38,325 |

Squared-Correlation | 0.057 | 0.164 | 0.385 | 0.612 |

Adj-pseudo R2 | 0.18423 | 0.35047 | 0.58023 | 0.74743 |

Log-Likelihood | -2,482,334.64 | -1,976,479.31 | -1,277,291.27 | -768,493.77 |

BIC | 4,964,785.39 | 3,953,222.48 | 2,554,994.15 | 1,537,599.67 |

We have a view of the 5 estimations, all reporting two-way clustered
standard-errors thanks to the use of the arguments `se`

and
`cluster`

.

So far we have seen how to report the results of multiple estimations
on the R console. Now, with the function `res2tex`

, we can
export the results to high quality Latex tables. The function
`res2tex`

works exactly as the function
`res2table`

, it takes any number of `femlm`

estimations. By default, it reports Latex code on the R console:

```
res2tex(gravity_subcluster, se = "twoway", cluster = trade[, c("Origin", "Destination")])
#> \global\long\def\sym#1{\ifmmode^{#1}\else\ensuremath{^{#1}}\fi}
#> \begin{table}[htbp]\centering
#> \caption{no title}
#> \begin{tabular}{lcccc}
#> & & & & \tabularnewline
#> \hline
#> \hline
#> Dependent Variable:&\multicolumn{4}{c}{Euros}\\
#> Model:&(1)&(2)&(3)&(4)\\
#> \hline
#> \emph{Variables}\tabularnewline
#> log(dist_km)&-1\sym{***}&-1\sym{***}&-2\sym{***}&-2\sym{***}\\
#> &(0.1527)&(0.1975)&(0.1238)&(0.1263)\\
#> \hline
#> \emph{Fixed-Effects}& & & & \\
#> Year&Yes&Yes&Yes&Yes\\
#> Destination&No&Yes&Yes&Yes\\
#> Origin&No&No&Yes&Yes\\
#> Product&No&No&No&Yes\\
#> \hline
#> \emph{Fit statistics}& & & & \\
#> Observations& 38,325&38,325&38,325&38,325\\
#> Adj-pseudo $R^2$ &0.18423&0.35047&0.58023&0.74743\\
#> Log-Likelihood & -2,482,334.64&-1,976,479.31&-1,277,291.27&-768,493.77\\
#> BIC & 4,964,785.39&3,953,222.48&2,554,994.15&1,537,599.67\\
#> \hline
#> \hline
#> \multicolumn{5}{l}{\emph{Two-way standard-errors in parenthesis. Signif Codes: ***: 0.01, **: 0.05, *: 0.1}}\\
#> \end{tabular}
#> \end{table}
```

This function has many optional arguments. The user can export the
Latex table directly into a file (argument `file`

), add a
title (arg. `title`

) and a label to the table (arg.
`label`

).

The coefficients can be renamed easily (arg. `dict`

), some
can be dropped (arg. `drop`

) and they can be easily reordered
with regular expressions (arg. `order`

).

The significance codes can easily be changed (arg.
`signifCode`

) and all quality of fit information can be
customized. Among others, the number of fixed-effect per cluster can
also be displayed using the argument `showClusterSize`

.

Consider the following example of the exportation of two tables:

```
# we set the dictionary once and for all
myDict = c("log(dist_km)" = "$\\ln (Distance)$", "(Intercept)" = "Constant")
# 1st export: we change the signif code and drop the intercept
res2tex(gravity_subcluster, signifCode = c("a" = 0.01, "b" = 0.05), drop = "Int", dict = myDict, file = "Estimation Table.tex", append = FALSE, title = "First export -- normal Standard-errors")
# 2nd export: clustered S-E + distance as the first coefficient
res2tex(gravity_subcluster, se = "cluster", cluster = trade$Product, order = "dist", dict = myDict, file = "Estimation Table.tex", title = "Second export -- clustered standard-errors (on Product variable)")
```

In this example, two tables containing the results of the 5
estimations are directly exported in the file “Estimation Table.tex”.
The file is re-created in the first exportation thanks to the argument
`append=FALSE`

.

To change the variable names in the Latex table, we use the argument
`dict`

. The variable `myDict`

is the dictionary we
use to rename the variables, it is simply a named vector. The original
name of the variables correspond to the names of `myDict`

while the new names of the variables are the values of this vector. Any
variable that matches the names of `myDict`

will be replaced
by its value. Thus we do not care of the order of appearance of the
variables in the estimation results.

In the first export, the coefficient of the intercept is dropped by
using `drop = "Int"`

(could be anything such that
`grepl(drop[1], "(Intercept)")`

is TRUE). In the second, the
coefficient of the distance is put before the intercept (which is kept).
Note that the actions performed by the arguments `drop`

or
`order`

are performed **before** the renaming
takes place with the argument `dict`

.

To obtain the fixed-effects of the estimation, the function
`getFE`

must be performed on the results. This function
returns a list containing the fixed-effects for each cluster. The
`summary`

method helps to have a quick overview:

```
fixedEffects <- getFE(gravity_results)
summary(fixedEffects)
#> Fixed-effects coefficients.
#> Origin Destination Product Year
#> Number of fixed-effects 15 15 20 10
#> Number of references 0 1 1 1
#> Mean 9.53 3.09 0.0129 0.157
#> Variance 1.63 1.23 1.86 0.0129
#>
#> COEFFICIENTS:
#> Origin: BE LU NL DE GB
#> 9.744 6.414 10.62 10.89 9.934 ... 10 remaining
#> -----
#> Destination: LU BE NL DE IE
#> 0 2.696 3.23 4.322 2.588 ... 10 remaining
#> -----
#> Product: 1 2 3 4 5
#> 0 1.414 0.6562 1.449 -1.521 ... 15 remaining
#> -----
#> Year: 2007 2008 2009 2010 2011
#> 0 0.06912 0.005226 0.07331 0.163 ... 5 remaining
```

We can see that the fixed-effects are balanced across clusters.
Indeed, apart from the first cluster, only one fixed-effect per cluster
needs to be set as reference (i.e. fixed to 0) to avoid collinearity
across the fixed-effects of the different clusters. This ensures that
the fixed-effects coefficients can be compared within cluster. Had there
be strictly more than one reference per cluster, their interpretation
would have not been possible at all. If this was the case though, a
warning message would have been prompted. Note that the mean values are
meaningless per se, but give a reference points to which compare the
fixed-effects within a cluster. Let’s look specifically at the
`Year`

fixed-effects:

```
fixedEffects$Year
#> 2007 2008 2009 2010 2011 2012
#> 0.000000000 0.069122369 0.005225534 0.073308290 0.163013422 0.192605196
#> 2013 2014 2015 2016
#> 0.230629312 0.242605390 0.282800746 0.310325679
```

Finally, the `plot`

method helps to distinguish the most
notable fixed-effects:

`plot(fixedEffects)`

For each cluster, the fixed-effects are first centered, then sorted, and finally the most notable (i.e. highest and lowest) are reported. The exponential of the coefficient is reported in the right hand side to simplify the interpretation for models with log-link (as the Poisson model). As we can see from the country of destination cluster, trade involving France (FR), Italy (IT) and Germany (DE) as destination countries is more than 2.7 times higher than the EU15 average. Further, the highest heterogeneity come from the product category, where trade in product 4 (dairy products) is roughly 2.7 times the average while product 14 (vegetable plaiting materials) represents a negligible fraction of the average.

Note however that the interpretation of the fixed-effects must be taken with extra care. In particular, here the fixed-effects can be interpreted only because they are perfectly balanced.

Now we present some other features of the package. First the possibility for non-linear in parameter estimation. The use of parallelism to accelerate the estimation. And finally some troobleshooting in case of problems.

The function `femlm`

also allows to have non-linear in
parameters right-hand-sides (RHS). First an example without
fixed-effects, the one with fixed-effects is given later. Let’s say we
want to estimate the following relation with a Poisson model:

\(E\left(z_i\right) = a\times x_i + b\times y_i\).

In fact, this type of model is non-linear in the context of a Poisson model because the sum is embedded within the log:

\(E\left(z_i\right) = \exp\left(\log\left(a\times x_i + b\times y_i\right)\right)\).

So let’s estimate such a relation. First we generate the data:

```
# Generating data:
n = 1000
# x and y: two positive random variables
x = rnorm(n, 1, 5)**2
y = rnorm(n, -1, 5)**2
# E(z) = 2*x + 3*y and some noise
z = rpois(n, 2*x + 3*y) + rpois(n, 1)
base = data.frame(x, y, z)
```

To estimate the non-linear relationship, we need to use the argument
`NL.fml`

where we put the non-linear part. We also have to
provide starting values with the argument `NL.start`

.
Finally, to ensure the RHS can be evaluated in any situation, we add
lower bounds for the parameters with the argument
`lower`

.

`result_NL = femlm(z~0, base, NL.fml = ~ log(a*x + b*y), NL.start = list(a=1, b=1), lower = list(a=0, b=0))`

Note that the arguments `NL.start`

and `lower`

are named lists. Setting `lower = list(a=0, b=0)`

means that
the optimization algorithm will never explores parameters for \(a\) and \(b\) that are lower than 0. The results
obtained can be interpreted similarly to results with linear RHS. We can
see them with a print:

```
print(result_NL)
#> Non-linear ML estimation, family = Poisson, Dep. Var.: z
#> Observations: 1,000
#> Standard-errors type: Standard
#> Estimate Std. Error z value Pr(>|z|)
#> a 2.0366 0.011331 179.74 < 2.2e-16 ***
#> b 3.0011 0.012809 234.31 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> BIC: -3,600.90 Pseudo-R2: 0.93929
#> Log-likelihood: 7,229.43 Squared Cor.: 0.99296
```

We can see that we obtain coefficients close to the generating values.

Adding fixed-effects is identical to the linear case. The user must
only be well aware of the functional form. Indeed, the fixed-effects
must enter the estimation **linearly**. This means that the
previous equation with one set of fixed-effects writes:

\(E\left(z_i\right) = \gamma_{id_i} \left( a\times x_i + b\times y_i \right)\),

where \(id_i\) is the class of observation \(i\) and \(\gamma\) is the vector of fixed-effects. Here the fixed-effects are in fact linear because in the context of the Poisson model we estimate:

\(E\left(z_i\right) = \exp\left(\gamma_{id_i}+\log\left(a\times x_i + b\times y_i\right)\right)\).

Further, remark that there exists an infinity of values of \(\gamma^{\prime}\), \(a^{\prime}\) and \(b^{\prime}\) such that:

\(\gamma_{k} \left( a\times x_i + b\times y_i \right) = \gamma_{k}^{\prime} \left(a^{\prime}\times x_i + b^{\prime}\times y_i \right),\forall i,k\).

An example is \(\gamma^{\prime}_{k} = 2\times \gamma_k\), \(a^{\prime} = a/2\) and \(b^{\prime} = b/2\). Thus estimating this relation directly will lead to a problem to uniquely identify the coefficients. To circumvent this problem, we just have to fix one of the coefficient, this will ensure that we uniquely identify them.

Let’s generate this relation:

```
# the class of each observation
id = sample(20, n, replace = TRUE)
base$id = id
# the vector of fixed-effects
gamma = rnorm(20)**2
# the new vector z_bis
z_bis = rpois(n, gamma[id] * (2*x + 3*y)) + rpois(n, 1)
base$z_bis = z_bis
```

Now we estimate it with the fixed-effects while fixing one of the coefficients (we fix \(a\) to its true value but it could be any value):

```
# we add the fixed-effect in the formula
result_NL_fe = femlm(z_bis~0|id, base, NL.fml = ~ log(2*x + b*y), NL.start = list(b=1), lower = list(b=0))
# The coef should be around 3
coef(result_NL_fe)
#> b
#> 3.000355
# the gamma and the exponential of the fixed-effects should be similar
rbind(gamma, exp(getFE(result_NL_fe)$id))
#> 1 2 3 4 5 6 7
#> gamma 0.3600610 3.002025 0.1480541 2.822553 0.00248734 5.573194 0.08504622
#> 0.3709867 3.017451 0.1654450 2.832735 0.01171078 5.651468 0.08863135
#> 8 9 10 11 12 13 14
#> gamma 1.063722 0.1497472 0.0991036 0.6983288 1.174996 2.360846 2.729491
#> 1.049059 0.1604504 0.1131981 0.7017970 1.204165 2.388689 2.725657
#> 15 16 17 18 19 20
#> gamma 1.807777 0.0231669 2.528759 0.1136854 1.979827 8.607081
#> 1.820352 0.0285860 2.579466 0.1236219 1.982471 8.658664
```

As we can see, we obtain the “right” estimates.

The package FENmlm integrates multi-platform parallelism to hasten
the estimation process. To use the multi-core facility, just use the
argument `cores`

:

```
# Sample of results:
# 1 core: 2.7s
system.time(femlm(Euros ~ log(dist_km)|Origin+Destination+Product+Year, trade, family = "negbin", cores = 1))
# 2 cores: 1.74s
system.time(femlm(Euros ~ log(dist_km)|Origin+Destination+Product+Year, trade, family = "negbin", cores = 2))
# 4 cores: 1.31s
system.time(femlm(Euros ~ log(dist_km)|Origin+Destination+Product+Year, trade, family = "negbin", cores = 4))
```

As you can see, the efficiency of increasing the number of cores is not 1 to 1. Two cores do not divide the computing time by 2, nor four cores by 4. However it still reduces significantly the computing time, which might be valuable for large sample estimations.

Although parallelism has been implemented for all likelihood models,
the only ones for which it is really efficient are the *Negative
Binomial* and the *Logit* models.

Indeed, the underlying `femlm`

code alternate between R
and C and the parallelism has been implemented with the C library
OpenMP. Setting the parallelism with OpenMP incurs “fixed-costs”, and
since there is many back and forth between R and C, these fixed-costs
add up significantly. Thus to be worth, the C-code sections with
parallelism must be “complex”.

Because the *Negative Binomial* and *Logit* models are
the ones with the most computationnaly intensive C-code chunks, they are
the ones that benefit the most from the parallelism.

This section deals with possible error/warning messages stemming from the estimation. There are two identified – possible – problems: collinearity and precision.

The user ought to estimate the coefficient of variables that are
**not** collinear: neither among each other, neither with
the fixed-effects. Estimation with collinear variables either leads the
algorithm to not converge, either leads to a non invertible Hessian
(leading to the absence of Variance-Covariance matrix for the
coefficients). In such cases, `femlm`

will raise a warning
and suggest to use the function `diagnostic`

to spot the
problem.

Let’s take an example in which we want to estimate the coefficient of
a variable which is constant. Of course it makes no sense, so a warning
will be raised suggesting to use the function `diagnostic`

to
figure out what is wrong.

```
base_coll = trade
base_coll$constant_variable = 1
res <- femlm(Euros ~ log(dist_km) + constant_variable|Origin+Destination+Product+Year, base_coll)
#> Warning in femlm(Euros ~ log(dist_km) + constant_variable | Origin +
#> Destination + : [femlm]: The optimization algorithm did not converge, the
#> results are not reliable. The information matrix is singular (likely presence
#> of collinearity). Use function diagnostic() to pinpoint collinearity problems.
diagnostic(res)
#> [1] "Variable 'constant_variable' is collinear with cluster Origin."
```

As we can see, the function `diagnostic`

spots the
collinear variables and name them. Even in more elaborate cases of
collinearity, the algorithm tries to find out the culprit and informs
the user accordingly.

When there is more than one fixed-effect, the “optimal” fixed-effects
are computed according to a fixed-point algorithm. This algorithm has a
stopping criterion which is equal to the argument
`precision.cluster`

. By default, it stops when the maximum
absolute difference between two iterations is lower than \(10^{-5}\).

Further, the algorithm implements a dynamic setting of the precision.
This means that it increases the precision (i.e. sets
`precision.cluster`

to a *lower* value) when some
condition occur in order to unblock the optimization algorithm. Thus,
thanks to the dynamic handling of the precision, the default value
`precision.cluster`

should be enough for most situations.

However, in some very *very* specific cases (for large data
sets where small errors can lead to large differences, or with very
“similar” fixed-effects (ie fixed-effects with many overlap)), the
algorithm may not converge because the fixed-effects are not computed
with enough precision (although they’ll be very close to optimality).
Said differently, in such situations the dynamic handling of the
precision is not enough because the initial rounding errors when
obtaining the cluster coefficients led the optimization algorithm
astray.

In such cases, setting the argument `precision.cluster`

to
a lower value may help the algorithm to converge. Again, such situation
should be rare and the default value should work for the large majority
of cases. Finally, note that increasing the precision leads to
significantly increase the running time of the algorithm.

Since the \(\gamma\) are parameters, I omit to put them in logarithmic form.↩︎