-
-
Notifications
You must be signed in to change notification settings - Fork 39
/
Copy path12-data.Rmd
1500 lines (964 loc) · 48.9 KB
/
12-data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Data
There are multiple ways to categorize data. For example,
- Qualitative vs. Quantitative:
| Qualitative | Quantitative |
|----------------------------------------|--------------------------------|
| in-depth interviews, documents, focus groups, case study, ethnography. open-ended questions. observations in words | experiments, observation in words, survey with closed-end questions, structured interviews |
| language, descriptive | quantities, numbers |
| Text-based | Numbers-based |
| Subjective | Objectivity |
## Cross-Sectional
## Time Series
$$
y_t = \beta_0 + x_{t1}\beta_1 + x_{t2}\beta_2 + ... + x_{t(k-1)}\beta_{k-1} + \epsilon_t
$$
Examples
- Static Model
- $y_t=\beta_0 + x_1\beta_1 + x_2\beta_2 - x_3\beta_3 - \epsilon_t$
- Finite Distributed Lag model
- $y_t=\beta_0 + pe_t\delta_0 + pe_{t-1}\delta_1 +pe_{t-2}\delta_2 + \epsilon_t$
- **Long Run Propensity (LRP)** is $LRP = \delta_0 + \delta_1 + \delta_2$
- Dynamic Model
- $GDP_t = \beta_0 + \beta_1GDP_{t-1} - \epsilon_t$
[Finite Sample Properties] for [Time Series]:
- A1-A3: OLS is unbiased
- A1-A4: usual standard errors are consistent and [Gauss-Markov Theorem] holds (OLS is BLUE)
- A1-A6, A6: Finite Sample [Wald Test] (t-test and F-test) are valid
[A3][A3 Exogeneity of Independent Variables] might not hold under time series setting
- Spurious Time Trend - solvable
- [Strict][A3 Exogeneity of Independent Variables] vs Contemporaneous Exogeneity - not solvable
In time series data, there are many processes:
- Autoregressive model of order p: AR(p)
- Moving average model of order q: MA(q)
- Autoregressive model of order p and moving average model of order q: ARMA(p,q)
- Autoregressive conditional heteroskedasticity model of order p: ARCH(p)
- Generalized Autoregressive conditional heteroskedasticity of orders p and q; GARCH(p.q)
### Deterministic Time trend
Both the dependent and independent variables are trending over time
**Spurious Time Series Regression**
$$
y_t = \alpha_0 + t\alpha_1 + v_t
$$
and x takes the form
$$
x_t = \lambda_0 + t\lambda_1 + u_t
$$
- $\alpha_1 \neq 0$ and $\lambda_1 \neq 0$
- $v_t$ and $u_t$ are independent
- there is no relationship between $y_t$ and $x_t$
If we estimate the regression,
$$
y_t = \beta_0 + x_t\beta_1 + \epsilon_t
$$
so the true $\beta_1=0$
- Inconsistent: $plim(\hat{\beta}_1)=\frac{\alpha_1}{\lambda_1}$
- Invalid Inference: $|t| \to^d \infty$ for $H_0: \beta_1=0$, will always reject the null as $n \to \infty$
- Uninformative $R^2$: $plim(R^2) = 1$ will be able to perfectly predict as $n \to \infty$
We can rewrite the equation as
$$
\begin{aligned}
y_t &=\beta_0 + \beta_1x_t+\epsilon_t \\
\epsilon_t &= \alpha_1t + v_t
\end{aligned}
$$
where $\beta_0 = \alpha_0$ and $\beta_1=0$. Since $x_t$ is a deterministic function of time, $\epsilon_t$ is correlated with $x_t$ and we have the usual omitted variable bias.\
Even when $y_t$ and $x_t$ are related ($\beta_1 \neq 0$) but they are both trending over time, we still get spurious results with the simple regression on $y_t$ on $x_t$
**Solutions to Spurious Trend**
1. Include time trend $t$ as an additional control
- consistent parameter estimates and valid inference
2. Detrend both dependent and independent variables and then regress the detrended outcome on detrended independent variables (i.e., regress residuals $\hat{u}_t$ on residuals $\hat{v}_t$)
- Detrending is the same as partialing out in the [Frisch-Waugh-Lovell Theorem]
- Could allow for non-linear time trends by including $t$ $t^2$, and $\exp(t)$
- Allow for seasonality by including indicators for relevant "seasons" (quarters, months, weeks).
[A3][A3 Exogeneity of Independent Variables] does not hold under:
- [Feedback Effect]
- $\epsilon_t$ influences next period's independent variables
- [Dynamic Specification]
- include last time period outcome as an explanatory variable
- [Dynamically Complete]
- For finite distrusted lag model, the number of lags needs to be absolutely correct.
### Feedback Effect
$$
y_t = \beta_0 + x_t\beta_1 + \epsilon_t
$$
[A3][A3 Exogeneity of Independent Variables]
$$
E(\epsilon_t|\mathbf{X})= E(\epsilon_t| x_1,x_2, ...,x_t,x_{t+1},...,x_T)
$$
will not equal 0, because $y_t$ will likely influence $x_{t+1},..,x_T$
- [A3][A3 Exogeneity of Independent Variables] is violated because we require the error to be uncorrelated with all time observation of the independent regressors (**strict exogeneity**)
### Dynamic Specification
$$
y_t = \beta_0 + y_{t-1}\beta_1 + \epsilon_t
$$
$$
E(\epsilon_t|\mathbf{X})= E(\epsilon_t| y_1,y_2, ...,y_t,y_{t+1},...,y_T)
$$
will not equal 0, because $y_t$ and $\epsilon_t$ are inherently correlated
- [A3][A3 Exogeneity of Independent Variables] is violated because we require the error to be uncorrelated with all time observation of the independent regressors (**strict exogeneity**)
- [Dynamic Specification] is not allowed under [A3][A3 Exogeneity of Independent Variables]
### Dynamically Complete
$$
y_t = \beta_0 + x_t\delta_0 + x_{t-1}\delta_1 + \epsilon_t
$$
$$
E(\epsilon_t|\mathbf{X})= E(\epsilon_t| x_1,x_2, ...,x_t,x_{t+1},...,x_T)
$$
will not equal 0, because if we did not include enough lags, $x_{t-2}$ and $\epsilon_t$ are correlated
- [A3][A3 Exogeneity of Independent Variables] is violated because we require the error to be uncorrelated with all time observation of the independent regressors (strict exogeneity)
- Can be corrected by including more lags (but when stop? )
Without [A3][A3 Exogeneity of Independent Variables]
- OLS is biased
- [Gauss-Markov Theorem]
- [Finite Sample Properties] are invalid
then, we can
- Focus on [Large Sample Properties]
- Can use [A3a] instead of [A3][A3 Exogeneity of Independent Variables]
[A3a] in time series become
$$
A3a: E(\mathbf{x}_t'\epsilon_t)= 0
$$
only the regressors in this time period need to be independent from the error in this time period (**Contemporaneous Exogeneity**)
- $\epsilon_t$ can be correlated with $...,x_{t-2},x_{t-1},x_{t+1}, x_{t+2},...$
- can have a dynamic specification $y_t = \beta_0 + y_{t-1}\beta_1 + \epsilon_t$
Deriving [Large Sample Properties] for Time Series
- Assumptions [A1][A1 Linearity], [A2][A2 Full rank], [A3a]
- [Weak Law] and [Central Limit Theorem] depend on [A5][A5 Data Generation (random Sampling)]
- $x_t$ and $\epsilon_t$ are dependent over t
- without [Weak Law] or [Central Limit Theorem] depend on [A5][A5 Data Generation (random Sampling)], we cannot have [Large Sample Properties] for [OLS][Ordinary Least Squares]
- Instead of [A5][A5 Data Generation (random Sampling)], we consider [A5a]
- Derivation of the Asymptotic Variance depends on [A4][A4 Homoskedasticity]
- time series setting introduces **Serial Correlation**: $Cov(\epsilon_t, \epsilon_s) \neq 0$
under [A1][A1 Linearity], [A2][A2 Full rank], [A3a], and [A5a], [OLS estimator][Ordinary Least Squares] is **consistent**, and **asymptotically normal**
### Highly Persistent Data
If $y_t, \mathbf{x}_t$ are not weakly dependent stationary process
- $y_t$ and $y_{t-h}$ are not almost independent for large h
- [A5a] does not hold and OLS is not **consistent** and does not have a limiting distribution.
- Example + Random Walk $y_t = y_{t-1} + u_t$ + Random Walk with a drift: $y_t = \alpha+ y_{t-1} + u_t$
**Solution** First difference is a stationary process
$$
y_t - y_{t-1} = u_t
$$
- If $u_t$ is a weakly dependent process (also called integrated of order 0) then $y_t$ is said to be difference-stationary process (integrated of order 1)
- For regression, if $\{y_t, \mathbf{x}_t \}$ are random walks (integrated at order 1), can consistently estimate the first difference equation
$$
\begin{aligned}
y_t - y_{t-1} &= (\mathbf{x}_t - \mathbf{x}_{t-1}\beta + \epsilon_t - \epsilon_{t-1}) \\
\Delta y_t &= \Delta \mathbf{x}\beta + \Delta u_t
\end{aligned}
$$
**Unit Root Test**
$$
y_t = \alpha + \alpha y_{t-1} + u_t
$$
tests if $\rho=1$ (integrated of order 1)\
- Under the null $H_0: \rho = 1$, OLS is not consistent or asymptotically normal.
- Under the alternative $H_a: \rho < 1$, OLS is consistent and asymptotically normal.
- usual t-test is not valid, will need to use the transformed equation to produce a valid test.
**Dickey-Fuller Test** $$
\Delta y_t= \alpha + \theta y_{t-1} + v_t
$$ where $\theta = \rho -1$\
- $H_0: \theta = 0$ and $H_a: \theta < 0$
- Under the null, $\Delta y_t$ is weakly dependent but $y_{t-1}$ is not.
- Dickey and Fuller derived the non-normal asymptotic distribution. If you reject the null then $y_t$ is not a random walk.
Concerns with the standard Dickey Fuller Test\
1. Only considers a fairly simplistic dynamic relationship
$$
\Delta y_t = \alpha + \theta y_{t-1} + \gamma_1 \Delta_{t-1} + ..+ \gamma_p \Delta_{t-p} +v_t
$$
- with one additional lag, under the null $\Delta_{y_t}$ is an AR(1) process and under the alternative $y_t$ is an AR(2) process.
- Solution: include lags of $\Delta_{y_t}$ as controls.
2. Does not allow for time trend $$
\Delta y_t = \alpha + \theta y_{t-1} + \delta t + v_t
$$
- allows $y_t$ to have a quadratic relationship with $t$
- Solution: include time trend (changes the critical values).
**Adjusted Dickey-Fuller Test** $$
\Delta y_t = \alpha + \theta y_{t-1} + \delta t + \gamma_1 \Delta y_{t-1} + ... + \gamma_p \Delta y_{t-p} + v_t
$$ where $\theta = 1 - \rho$\
- $H_0: \theta_1 = 0$ and $H_a: \theta_1 < 0$
- Under the null, $\Delta y_t$ is weakly dependent but $y_{t-1}$ is not
- Critical values are different with the time trend, if you reject the null then $y_t$ is not a random walk.
##### Newey West Standard Errors
If [A4][A4 Homoskedasticity] does not hold, we can use Newey West Standard Errors (HAC - Heteroskedasticity Autocorrelation Consistent)
$$
\hat{B} = T^{-1} \sum_{t=1}^{T} e_t^2 \mathbf{x'_tx_t} + \sum_{h=1}^{g}(1-\frac{h}{g+1})T^{-1}\sum_{t=h+1}^{T} e_t e_{t-h}(\mathbf{x_t'x_{t-h}+ x_{t-h}'x_t})
$$
- estimates the covariances up to a distance g part
- downweights to insure $\hat{B}$ is PSD
- How to choose g:
- For yearly data: $g = 1$ or 2 is likely to account for most of the correlation
- For quarterly or monthly data: g should be larger (\$g = 4\$ or 8 for quarterly and $g = 12$ or 14 for monthly)
- can also take integer part of $4(T/100)^{2/9}$ or integer part of $T^{1/4}$
**Testing for Serial Correlation**
1. Run OLS regression of $y_t$ on $\mathbf{x_t}$ and obtain residuals $e_t$
2. Run OLS regression of $e_t$ on $\mathbf{x}_t, e_{t-1}$ and test whether coefficient on $e_{t-1}$ is significant.
3. Reject the null of no serial correlation if the coefficient is significant at the 5% level.
- Test using heteroskedastic robust standard errors
- can include $e_{t-2},e_{t-3},..$ in step 2 to test for higher order serial correlation (t-test would now be an F-test of joint significance)
## Repeated Cross Sections
For each time point (day, month, year, etc.), a set of data is sampled. This set of data can be different among different time points.
For example, you can sample different groups of students each time you survey.
Allowing structural change in pooled cross section
$$
y_i = \mathbf{x}_i \beta + \delta_1 y_1 + ... + \delta_T y_T + \epsilon_i
$$
Dummy variables for all but one time period
- allows different intercept for each time period
- allows outcome to change on average for each time period
Allowing for structural change in pooled cross section
$$
y_i = \mathbf{x}_i \beta + \mathbf{x}_i y_1 \gamma_1 + ... + \mathbf{x}_i y_T \gamma_T + \delta_1 y_1 + ...+ \delta_T y_T + \epsilon_i
$$
Interact $x_i$ with time period dummy variables
- allows different slopes for each time period
- allows effects to change based on time period (**structural break**)
- Interacting all time period dummies with $x_i$ can produce many variables - use hypothesis testing to determine which structural breaks are needed.
### Pooled Cross Section
$$
y_i=\mathbf{x_i\beta +x_i \times y1\gamma_1 + ...+ x_i \times yT\gamma_T + \delta_1y_1+...+ \delta_Ty_T + \epsilon_i}
$$
Interact $x_i$ with time period dummy variables
- allows different slopes for each time period
- allows effect to change based on time period (structural break)
- interacting all time period dummies with $x_i$ can produce many variables - use hypothesis testing to determine which structural breaks are needed.
## Panel Data
Detail notes in R can be found [here](https://cran.r-project.org/web/packages/plm/vignettes/plmPackage.html#robust)
Follows an individual over T time periods.
Panel data structure is like having n samples of time series data
**Characteristics**
- Information both across individuals and over time (cross-sectional and time-series)
- N individuals and T time periods
- Data can be either
- Balanced: all individuals are observed in all time periods
- Unbalanced: all individuals are not observed in all time periods.
- Assume correlation (clustering) over time for a given individual, with independence over individuals.
**Types**
- Short panel: many individuals and few time periods.
- Long panel: many time periods and few individuals
- Both: many time periods and many individuals
**Time Trends and Time Effects**
- Nonlinear
- Seasonality
- Discontinuous shocks
**Regressors**
- Time-invariant regressors $x_{it}=x_i$ for all t (e.g., gender, race, education) have zero within variation
- Individual-invariant regressors $x_{it}=x_{t}$ for all i (e.g., time trend, economy trends) have zero between variation
**Variation for the dependent variable and regressors**
- Overall variation: variation over time and individuals.
- Between variation: variation between individuals
- Within variation: variation within individuals (over time).
| Estimate | Formula |
|--------------|---------------------------------------------------------|
| Individual mean | $\bar{x_i}= \frac{1}{T} \sum_{t}x_{it}$ |
| Overall mean | $\bar{x}=\frac{1}{NT} \sum_{i} \sum_t x_{it}$ |
| Overall Variance | $s _O^2 = \frac{1}{NT-1} \sum_i \sum_t (x_{it} - \bar{x})^2$ |
| Between variance | $s_B^2 = \frac{1}{N-1} \sum_i (\bar{x_i} -\bar{x})^2$ |
| Within variance | $s_W^2= \frac{1}{NT-1} \sum_i \sum_t (x_{it} - \bar{x_i})^2 = \frac{1}{NT-1} \sum_i \sum_t (x_{it} - \bar{x_i} +\bar{x})^2$ |
**Note**: $s_O^2 \approx s_B^2 + s_W^2$
Since we have n observation for each time period t, we can control for each time effect separately by including time dummies (time effects)
$$
y_{it}=\mathbf{x_{it}\beta} + d_1\delta_1+...+d_{T-1}\delta_{T-1} + \epsilon_{it}
$$
**Note**: we cannot use these many time dummies in time series data because in time series data, our n is 1. Hence, there is no variation, and sometimes not enough data compared to variables to estimate coefficients.
**Unobserved Effects Model** Similar to group clustering, assume that there is a random effect that captures differences across individuals but is constant in time.
$$
y_it=\mathbf{x_{it}\beta} + d_1\delta_1+...+d_{T-1}\delta_{T-1} + c_i + u_{it}
$$
where
- $c_i + u_{it} = \epsilon_{it}$
- $c_i$ unobserved individual heterogeneity (effect)
- $u_{it}$ idiosyncratic shock
- $\epsilon_{it}$ unobserved error term.
### Pooled OLS Estimator
If $c_i$ is uncorrelated with $x_{it}$
$$
E(\mathbf{x_{it}'}(c_i+u_{it})) = 0
$$
then [A3a] still holds. And we have Pooled OLS consistent.
If [A4][A4 Homoskedasticity] does not hold, OLS is still consistent, but not efficient, and we need cluster robust SE.
Sufficient for [A3a] to hold, we need
- **Exogeneity** for $u_{it}$ [A3a] (contemporaneous exogeneity): $E(\mathbf{x_{it}'}u_{it})=0$ time varying error
- **Random Effect Assumption** (time constant error): $E(\mathbf{x_{it}'}c_{i})=0$
Pooled OLS will give you consistent coefficient estimates under [A1][A1 Linearity], [A2][A2 Full rank], [A3a] (for both $u_{it}$ and RE assumption), and [A5][A5 Data Generation (random Sampling)] (randomly sampling across i).
### Individual-specific effects model
- If we believe that there is unobserved heterogeneity across individual (e.g., unobserved ability of an individual affects $y$), If the individual-specific effects are correlated with the regressors, then we have the [Fixed Effects Estimator]. and if they are not correlated we have the [Random Effects Estimator](#random-effects-estimator).
#### Random Effects Estimator {#random-effects-estimator}
Random Effects estimator is the Feasible GLS estimator that assumes $u_{it}$ is serially uncorrelated and homoskedastic
- Under [A1][A1 Linearity], [A2][A2 Full rank], [A3a] (for both $u_{it}$ and RE assumption) and [A5][A5 Data Generation (random Sampling)] (randomly sampling across i), RE estimator is consistent.
- If [A4][A4 Homoskedasticity] holds for $u_{it}$, RE is the most efficient estimator
- If [A4][A4 Homoskedasticity] fails to hold (may be heteroskedasticity across i, and serial correlation over t), then RE is not the most efficient, but still more efficient than pooled OLS.
#### Fixed Effects Estimator
also known as **Within Estimator** uses within variation (over time)
If the **RE assumption** is not hold ($E(\mathbf{x_{it}'}c_i) \neq 0$), then A3a does not hold ($E(\mathbf{x_{it}'}\epsilon_i) \neq 0$).
Hence, the OLS and RE are inconsistent/biased (because of omitted variable bias)
However, FE can only fix bias due to time-invariant factors (both observables and unobservables) correlated with treatment (not time-variant factors that correlated with the treatment).
The traditional FE technique is flawed when lagged dependent variables are included in the model. [@nickell1981biases] [@narayanan2013estimating]
With measurement error in the independent, FE will exacerbate the errors-in-the-variables bias.
##### Demean Approach
To deal with violation in $c_i$, we have
$$
y_{it}= \mathbf{x_{it} \beta} + c_i + u_{it}
$$
$$
\bar{y_i}=\bar{\mathbf{x_i}} \beta + c_i + \bar{u_i}
$$
where the second equation is the time averaged equation
using **within transformation**, we have
$$
y_{it} - \bar{y_i} = \mathbf{(x_{it} - \bar{x_i})}\beta + u_{it} - \bar{u_i}
$$
because $c_i$ is time constant.
The Fixed Effects estimator uses POLS on the transformed equation
$$
y_{it} - \bar{y_i} = \mathbf{(x_{it} - \bar{x_i})} \beta + d_1\delta_1 + ... + d_{T-2}\delta_{T-2} + u_{it} - \bar{u_i}
$$
- we need [A3][A3 Exogeneity of Independent Variables] (strict exogeneity) ($E((\mathbf{x_{it}-\bar{x_i}})'(u_{it}-\bar{u_i})=0$) to have FE consistent.
- Variables that are time constant will be absorbed into $c_i$. Hence we cannot make inference on time constant independent variables.
- If you are interested in the effects of time-invariant variables, you could consider the OLS or **between estimator**
- It's recommended that you should still use cluster robust standard errors.
##### Dummy Approach
Equivalent to the within transformation (i.e., mathematically equivalent to [Demean Approach]), we can have the fixed effect estimator be the same with the dummy regression
$$
y_{it} = x_{it}\beta + d_1\delta_1 + ... + d_{T-2}\delta_{T-2} + c_1\gamma_1 + ... + c_{n-1}\gamma_{n-1} + u_{it}
$$
where
$$
c_i
=
\begin{cases}
1 &\text{if observation is i} \\
0 &\text{otherwise} \\
\end{cases}
$$
- The standard error is incorrectly calculated.
- the FE within transformation is controlling for any difference across individual which is allowed to correlated with observables.
##### First-difference Approach
Economists typically use this approach
$$
y_{it} - y_{i (t-1)} = (\mathbf{x}_{it} - \mathbf{x}_{i(t-1)}) \beta + + (u_{it} - u_{i(t-1)})
$$
##### Fixed Effects Summary
- The three approaches are **almost** equivalent.
- [Demean Approach] is mathematically equivalent to [Dummy Approach]
- If you have only 1 period, all 3 are the same.
- Since fixed effect is a within estimator, only **status changes** can contribute to $\beta$ variation.
- Hence, with a small number of changes then the standard error for $\beta$ will explode
- Status changes mean subjects change from (1) control to treatment group or (2) treatment to control group. Those who have status change, we call them **switchers**.
- Treatment effect is typically **non-directional**.
- You can give a parameter for the direction if needed.
- Issues:
- You could have fundamental difference between switchers and non-switchers. Even though we can't definitive test this, but providing descriptive statistics on switchers and non-switchers can give us confidence in our conclusion.
- Because fixed effects focus on bias reduction, you might have larger variance (typically, with fixed effects you will have less df)
- If the true model is [random effect](#random-effects-estimator), economists typically don't care, especially when $c_i$ is the random effect and $c_i \perp x_{it}$ (because RE assumption is that it is unrelated to $x_{it}$). The reason why economists don't care is because RE wouldn't correct bias, it only improves efficiency over OLS.
- You can estimate FE for different units (not just individuals).
- FE removes bias from time invariant factors but not without costs because it uses within variation, which imposes strict exogeneity assumption on $u_{it}$: $E[(x_{it} - \bar{x}_{i})(u_{it} - \bar{u}_{it})]=0$
Recall
$$
Y_{it} = \beta_0 + X_{it}\beta_1 + \alpha_i + u_{it}
$$
where $\epsilon_{it} = \alpha_i + u_{it}$
$$
\hat{\sigma}^2_\epsilon = \frac{SSR_{OLS}}{NT - K}
$$
$$
\hat{\sigma}^2_u = \frac{SSR_{FE}}{NT - (N+K)} = \frac{SSR_{FE}}{N(T-1)-K}
$$
It's ambiguous whether your variance of error changes up or down because SSR can increase while the denominator decreases.
FE can be unbiased, but not consistent (i.e., not converging to the true effect)
##### FE Examples
##### @blau1999
- Intergenerational mobility
- If we transfer resources to low income family, can we generate upward mobility (increase ability)?
Mechanisms for intergenerational mobility
1. Genetic (policy can't affect) (i.e., ability endowment)
2. Environmental indirect
3. Environmental direct
$$
\frac{\% \Delta \text{Human capital}}{\% \Delta \text{income}}
$$
4. Financial transfer
Income measures:
1. Total household income
2. Wage income
3. Non-wage income
4. Annual versus permanent income
Core control variables:
**Bad controls are those jointly determined with dependent variable**
Control by mother = choice by mother
Uncontrolled by mothers:
- mother race
- location of birth
- education of parents
- household structure at age 14
$$
Y_{ijt} = X_{jt} \beta_i + I_{jt} \alpha_i + \epsilon_{ijt}
$$
where
- $i$ = test
- $j$ = individual (child)
- $t$ = time
Grandmother's model
Since child is nested within mother and mother nested within grandmother, the fixed effect of child is included in the fixed effect of mother, which is included in the fixed-effect of grandmother
$$
Y_{ijgmt} = X_{it} \beta_{i} + I_{jt} \alpha_i + \gamma_g + u_{ijgmt}
$$
where
- $i$ = test, $j$ = kid, $m$ = mother, $g$ = grandmother
- where $\gamma_g$ includes $\gamma_m$ includes $\gamma_j$
Grandma fixed-effect
Pros:
- control for some genetics + fixed characteristics of how mother are raised
- can estimate effect of parameter income
Con:
- Might not be a sufficient control
Common to cluster a the fixed-effect level (common correlated component)
**Fixed effect exaggerates attenuation bias**
Error rate on survey can help you fix this (plug in the number only , but not the uncertainty associated with that number).
##### @babcock2010
$$
T_{ijct} = \alpha_0 + S_{jct} \alpha_1 + X_{ijct} \alpha_2 + u_{ijct}
$$
where
- $S_{jct}$ is the average class expectation
- $X_{ijct}\alpha_2$ is the individual characteristics
- $i$ student
- $j$ instructor
- $c$ course
- $t$ time
$$
T_{ijct} = \beta_0+ S_{jct} \beta_1+ X_{ijct} \beta_2 +\mu_{jc} + \epsilon_{ijct}
$$
where $\mu_{jc}$ is instructor by course fixed effect (unique id), which is different from $(\theta_j + \delta_c)$
1. Decrease course shopping because conditioned on available information ($\mu_{jc}$) (class grade and instructor's info).
2. Grade expectation change even though class materials stay the same
Identification strategy is
- Under (fixed) time-varying factor that could bias my coefficient (simultaneity)
$$
Y_{ijt} = X_{it} \beta_1 + \text{Teacher Experience}_{jt} \beta_2 + \text{Teacher education}_{jt} \beta_3 + \text{Teacher score}_{it}\beta_4 + \dots + \epsilon_{ijt}
$$
Drop teacher characteristics, and include teacher dummy effect
$$
Y_{ijt} = X_{it} \alpha + \Gamma_{it} \theta_j + u_{ijt}
$$
where $\alpha$ is the within teacher (conditional on teacher fixed effect) and $j = 1 \to (J-1)$
Nuisance in the sense that we don't about the interpretation of $\alpha$
The least we can say about $\theta_j$ is the teacher effect conditional on student test score.
$$
Y_{ijt} = X_{it} \gamma + \epsilon_{ijt}
$$
$\gamma$ is between within (unconditional) and $e_{ijt}$ is the prediction error
$$
e_{ijt} = T_{it} \delta_j + \tilde{e}_{ijt}
$$
where $\delta_j$ is the mean for each group
$$
Y_{ijkt} = Y_{ijkt-1} + X_{it} \beta + T_{it} \tau_j + (W_i + P_k + \epsilon_{ijkt})
$$
where
- $Y_{ijkt-1}$ = lag control
- $\tau_j$ = teacher fixed time
- $W_i$ is the student fixed effect
- $P_k$ is the school fixed effect
- $u_{ijkt} = W_i + P_k + \epsilon_{ijkt}$
And we worry about selection on class and school
Bias in $\tau$ (for 1 teacher) is
$$
\frac{1}{N_j} \sum_{i = 1}^N (W_i + P_k + \epsilon_{ijkt})
$$
where $N_j$ = the number of student in class with teacher $j$
then we can $P_k + \frac{1}{N_j} \sum_{i = 1}^{N_j} (W_i + \epsilon_{ijkt})$
Shocks from small class can bias $\tau$
$$
\frac{1}{N_j} \sum_{i = 1}^{N_j} \epsilon_{ijkt} \neq 0
$$
which will inflate the teacher fixed effect
Even if we create random teacher fixed effect and put it in the model, it still contains bias mentioned above which can still $\tau$ (but we do not know the way it will affect - whether more positive or negative).
If teachers switch schools, then we can estimate both teacher and school fixed effect (**mobility web** thin vs. thick)
Mobility web refers to the web of switchers (i.e., from one status to another).
$$
Y_{ijkt} = Y_{ijk(t-1)} \alpha + X_{it}\beta + T_{it} \tau + P_k + \epsilon_{ijkt}
$$
If we demean (fixed-effect), $\tau$ (teacher fixed effect) will go away
If you want to examine teacher fixed effect, we have to include teacher fixed effect
Control for school, the article argues that there is no selection bias
For $\frac{1}{N_j} \sum_{i =1}^{N_j} \epsilon_{ijkt}$ (teacher-level average residuals), $var(\tau)$ does not change with $N_j$ (Figure 2 in the paper). In words, the quality of teachers is not a function of the number of students
If $var(\tau) =0$ it means that teacher quality does not matter
Spin-off of [Measurement Error]: Sampling error or estimation error
$$
\hat{\tau}_j = \tau_j + \lambda_j
$$
$$
var(\hat{\tau}) = var(\tau + \lambda)
$$
Assume $cov(\tau_j, \lambda_j)=0$ (reasonable) In words, your randomness in getting children does not correlation with teacher quality.
Hence,
$$
\begin{aligned}
var(\hat{\tau}) &= var(\tau) + var(\lambda) \\
var(\tau) &= var(\hat{\tau}) - var(\lambda) \\
\end{aligned}
$$
We have $var(\hat{\tau})$ and we need to estimate $var(\lambda)$
$$
var(\lambda) = \frac{1}{J} \sum_{j=1}^J \hat{\sigma}^2_j
$$ where $\hat{\sigma}^2_j$ is the squared standard error of the teacher $j$ (a function of $n$)
Hence,
$$
\frac{var(\tau)}{var(\hat{\tau})} = \text{reliability} = \text{true variance signal}
$$ also known as how much noise in $\hat{\tau}$ and
$$
1 - \frac{var(\tau)}{var(\hat{\tau})} = \text{noise}
$$
Even in cases where the true relationship is that $\tau$ is a function of $N_j$, then our recovery method for $\lambda$ is still not affected
To examine our assumption
$$
\hat{\tau}_j = \beta_0 + X_j \beta_1 + \epsilon_j
$$
Regressing teacher fixed-effect on teacher characteristics should give us $R^2$ close to 0, because teacher characteristics cannot predict sampling error ($\hat{\tau}$ contain sampling error)
### Tests for Assumptions
We typically don't test heteroskedasticity because we will use robust covariance matrix estimation anyway.
Dataset
```{r}
library("plm")
data("EmplUK", package="plm")
data("Produc", package="plm")
data("Grunfeld", package="plm")
data("Wages", package="plm")
```
#### Poolability
also known as an F test of stability (or Chow test) for the coefficients
$H_0$: All individuals have the same coefficients (i.e., equal coefficients for all individuals).
$H_a$ Different individuals have different coefficients.
Notes:
- Under a within (i.e., fixed) model, different intercepts for each individual are assumed
- Under random model, same intercept is assumed
```{r}
library(plm)
plm::pooltest(inv~value+capital, data=Grunfeld, model="within")
```
Hence, we reject the null hypothesis that coefficients are stable. Then, we should use the random model.
#### Individual and time effects
use the Lagrange multiplier test to test the presence of individual or time or both (i.e., individual and time).
Types:
- `honda`: [@honda1985testing] Default
- `bp`: [@Breusch_1980] for unbalanced panels
- `kw`: [@King_1997] unbalanced panels, and two-way effects
- `ghm`: [@gourieroux1982likelihood]: two-way effects
```{r}
pFtest(inv~value+capital, data=Grunfeld, effect="twoways")
pFtest(inv~value+capital, data=Grunfeld, effect="individual")
pFtest(inv~value+capital, data=Grunfeld, effect="time")
```
#### Cross-sectional dependence/contemporaneous correlation
- Null hypothesis: residuals across entities are not correlated.
##### Global cross-sectional dependence
```{r}
pcdtest(inv~value+capital, data=Grunfeld, model="within")
```
##### Local cross-sectional dependence
use the same command, but supply matrix `w` to the argument.
```{r}
pcdtest(inv~value+capital, data=Grunfeld, model="within")
```
#### Serial Correlation
- Null hypothesis: there is no serial correlation
- usually seen in macro panels with long time series (large N and T), not seen in micro panels (small T and large N)
- Serial correlation can arise from individual effects(i.e., time-invariant error component), or idiosyncratic error terms (e..g, in the case of AR(1) process). But typically, when we refer to serial correlation, we refer to the second one.
- Can be
- **marginal** test: only 1 of the two above dependence (but can be biased towards rejection)
- **joint** test: both dependencies (but don't know which one is causing the problem)
- **conditional** test: assume you correctly specify one dependence structure, test whether the other departure is present.
##### Unobserved effect test
- semi-parametric test (the test statistic $W \dot{\sim} N$ regardless of the distribution of the errors) with $H_0: \sigma^2_\mu = 0$ (i.e., no unobserved effects in the residuals), favors pooled OLS.
- Under the null, covariance matrix of the residuals = its diagonal (off-diagonal = 0)
- It is robust against both **unobserved effects** that are constant within every group, and any kind of **serial correlation**.
```{r}
pwtest(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc)
```
Here, we reject the null hypothesis that the no unobserved effects in the residuals. Hence, we will exclude using pooled OLS.
##### Locally robust tests for random effects and serial correlation
- A joint LM test for **random effects** and **serial correlation** assuming normality and homoskedasticity of the idiosyncratic errors [@baltagi1991joint][@baltagi1995testing]
```{r}
pbsytest(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc,
test = "j")
```
Here, we reject the null hypothesis that there is no presence of **serial correlation,** and **random effects**. But we still do not know whether it is because of serial correlation, of random effects or of both
To know the departure from the null assumption, we can use @bera2001tests's test for first-order serial correlation or random effects (both under normality and homoskedasticity assumption of the error).
BSY for serial correlation
```{r}
pbsytest(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc)
```
BSY for random effects
```{r}
pbsytest(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp,
data=Produc,
test="re")
```
Since BSY is only locally robust, if you "know" there is no serial correlation, then this test is based on LM test is more superior:
```{r}
plmtest(inv ~ value + capital, data = Grunfeld,
type = "honda")
```
On the other hand, if you know there is no random effects, to test for serial correlation, use [@breusch1978testing]-[@godfrey1978testing]'s test
```{r, eval = FALSE}
lmtest::bgtest()
```
If you "know" there are random effects, use [@baltagi1995testing]'s. to test for serial correlation in both AR(1) and MA(1) processes.
$H_0$: Uncorrelated errors.
Note:
- one-sided only has power against positive serial correlation.
- applicable to only balanced panels.
```{r}
pbltest(
log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp,
data = Produc,
alternative = "onesided"
)
```
General serial correlation tests