-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathsimulate2.ihlp
646 lines (531 loc) · 27.9 KB
/
simulate2.ihlp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
{smcl}
{hline}
{hi:help simulate2}{right: v. 1.03 - 21. January 2022}
{hi:help psimulate2}{right: v. 1.09 - 31. October 2023}
{hline}
{title:Title}
{p 5 4}{cmd:simulate2} - enhanced functions for {help simulate}.{p_end}
{p 4 4}{cmd:psimulate2} - running {cmd:simulate2} in parallel.{p_end}
{title:Syntax}
{p 9 17 2}
{cmd:simulate2}
[{it:{help exp_list}}]
{cmd:,} {opt r:eps(#)} [{it:options1 options2}]
{cmd::} {it:command}
{p_end}
{p 8 17 2}
{cmd:psimulate2} [{it:{help exp_list}}]
{cmd:,} {opt r:eps(#)} {cmdab:p:arallel(}{it:#2, options3}{cmd:)} [{it:options2}]
{cmd::} {it:command}
{synoptset 25}{...}
{synopt:{it:options1}}Description{p_end}
{synoptline}
{synopt:{opt nodots}}suppress replication dots{p_end}
{synopt :{opt dots(#)}}display dots every {it:#} replications{p_end}
{synopt:{opt noi:sily}}display any output from {it:command}{p_end}
{synopt:{opt tr:ace}}trace {it:command}{p_end}
{synopt:{opt nol:egend}}suppress table legend{p_end}
{synopt:{opt v:erbose}}display the full table legend{p_end}
{synoptline}
{p2colreset}{...}
{synoptset 25}{...}
{synopt:{it:options2}}Description{p_end}
{synoptline}
{synopt:{help prefix_saving_option:{bf:{ul:sa}ving(}{it:filename}{bf:, ...)}}}save
results to {it:filename}{p_end}
{synopt:{opt seed(options)}}control of seed, see {help simulate2##optionsSeed: seed options}{p_end}
{synopt:{opt seeds:ave(options)}}saves the used seeds, see {help simulate2##SeedSaving: saving seeds}{p_end}
{synopt:{opt seedstream(integer)}}starting seedstream, only {cmd:psimulate2}{p_end}
{synopt:{opt nocl:s}}do not refresh window (only {cmd:psimulate2}){p_end}
{synopt:{opt onlydots}}display dots rather than output window. Recommended for server use.{p_end}
{synopt:{opt docmd(string)}}Alterantive command to call do files.{p_end}
{synopt:{opt globalid(string)}}Sets id for simulation run. Necessary if multiple instances of {cmd:psimulate2} are run on the same machine.{p_end}
{synoptline}
{p2colreset}{...}
{synoptset 25}{...}
{synopt:{it:options3}}Description{p_end}
{synoptline}
{synopt:{opt exe(string)}}sets the path to the Stata.exe{p_end}
{synopt:{opt temppath(string)}}alternative path for temporary files{p_end}
{synopt:{opt proc:essors(string)}}max number of processors, only for Stata MP{p_end}
{synopt:{opt simulate}}use {help simulate} rather than {cmd:simulate2}.
If {cmd:psimulate2} is run on Stata 15 then {help simulate} is automatically used. {p_end}
{synoptline}
{p2colreset}{...}
{p 4 6 2}
All weight types supported by {it:command} are allowed; see {help weight}.
{opt simulate2} uses {help frames} and requires Stata 16 or higher.
{opt psimulate2} requires Stata 15 or higher.
{opt psimulate2} works on MacOS, Microsoft Windows and Unix systems.
{p_end}
{p 4 6 2}
{cmd:#} is the number of repetitions and {cmd:#2} the number of parallel Stata instances.
{p_end}
{marker description}{...}
{title:Description}
{pstd}
{opt simulate2} eases the programming task of performing Monte Carlo-type
simulations. Typing
{pin}
{cmd:. simulate2} {it:{help exp_list}}{cmd:, reps(}{it:#}{cmd:)} {cmd::} {it:command}
{pstd}
runs {it:command} for {it:#} replications and collects the results in
{it:exp_list}.
{pstd}
{it:command} defines the command that performs one simulation.
Most Stata commands and user-written programs can be used with
{opt simulate2}, as long as they follow {help language:standard Stata syntax}.
The {opt by} prefix may not be part of {it:command}.
{pstd}
{it:{help exp_list}} specifies the expression to be calculated from the
execution of {it:command}.
If no expressions are given, {it:exp_list} assumes a default, depending upon
whether {it:command} changes results in {opt e()} or {opt r()}. If
{it:command} changes results in {opt e()}, the default is {opt _b}. If
{it:command} changes results in {opt r()} (but not {opt e()}), the default is
all the scalars posted to {opt r()}. It is an error not to specify an
expression in {it:exp_list} otherwise.
{pstd}
{opt simulate2} is an extension (or "hack") of the Stata build-in {help simulate} command.
It extends the command by allowing programs to return macros ({it:strings}) to
{opt e()} or {opt r()}.
To do so it uses {help frame post} rather than {help postfile}.
The computational costs to return strings are small and
{opt simulate2} is only marginally slower than {help simulate}.
{pstd}
{cmd:simulate2} and {cmd:psimulate2} can save results to frames
instead of dtas. Frames and .dta can be both appended as well.
{pstd}
{opt simulate2} has advanced options to assign seeds, random number generator states to specific draws of {opt simulate2} and save those.
The {help rngstate:rngstate} (or seed), the {help rng:random number generator}
and the {help rngstream:seed stream} can saved in a separate {it:frame} or
{it:datatset}.
{pstd}
For an introduction into drawing pseudo random numbers see {help set_rng:set rng}, {help seed} and {help rngstream}.
{pstd}
{cmd:psimulate2} is a parallel version of {cmd:simulate2} speeding up simulations. Typing
{pin}
{cmd:. psimulate2} {it:{help exp_list}}{cmd:, reps(}{it:#}{cmd:) parallel(}{it:}#2{cmd:) :} {it:command}
{pstd}
runs {it:command} for {cmd:#} replication on {cmd:#2} parallel Stata instances and collects
the results in {it:exp_list}.
{pstd}
{cmd:psimulate2} splits the number of replications into equal blocks and each block
is run on a separate Stata instance.
To do so {cmd:psimulate2} creates a do file and a batch file.
The batch file is then used to start a new Stata instance running the corresponding do file.
The running instance is acting as a parent instance, the other are child instances.
The output of {cmd:psimulate2} differs to the one from {cmd:simulate} or {cmd:simulate2}.
It shows the percentage which is done, elapsed time and expected time left and finishing time.
{pstd}
Before a new instance is started {cmd:psimulate2} saves the current data set in memory.
This allows all that all Stata instances start using the same dataset.
It is also able move all programs in memory which are not saved in an ado directory,
such as programs defined in the do file before calling {cmd:psimulate2}.
Any mata defind functions (see {stata mata mata memory}, {help mata mata memory:help file})
will be moved from the parent to the child instance.
{cmd:psimulate2} will create a new {help mata lmbuild:mlib} file and store it
in the temp folder and then set the temp folder as a new ado path.
Mata matrices are moved from the parent to the child instance and
are saved in the temp folder.
Globals and not permanently set ado paths are moved as well.
It {bf:{ul:does not}} move frames, locals, matrices (only Stata), scalars (only Stata)
or values saved in e(), r() or s() from the parent to the child instances.
{pstd}
{cmd:psimulate2} uses {help seedstream:seedstreams} if no {cmd:seedstream} is
defined in option {cmd:seed()}.
In this case each child instance is assigned its own {cmd:seedstream}.
This ensures that random number draws do not overlap.
Parallel use of {cmd:psimulate2} is possible with different seedstreams for
each machine.
The option {cmd:seedstream()} sets the seedstream for the first instance.
{marker psimLoop}{pstd}
Care is required if {cmd:psimulate2} is used in loops.
If no seed options are set, {cmd:psimulate2} will {ul:{bf:always}} use the
current Stata seed.
However after {cmd:psimulate2} is completed it {ul:{bf:does not}}
(and cannot) set the seed to the last seed from the simulation.
Therefore random draws will be the same across iterations of the loop.
To avoid this behaviour {cmd:seed(}{it:_current}{cmd:)} saves the last used
seed in a global. In all consecutive iterations of a loop, the global will be
picked up, the seed updated for the {cmd:simulate2} runs used and
after finishing the global will be updated again.
This allows that draws across iterations of a loop differ.
{marker options}{...}
{title:Options}
{phang}
{opt reps(#)} is required -- it specifies the number of replications to
be performed.
INCLUDE help dots
{phang}
{opt noisily} requests that any output from {it:command} be displayed.
This option implies the {opt nodots} option.
{phang}
{opt trace} causes a trace of the execution of {it:command} to be displayed.
This option implies the {opt noisily} option.
{phang}
{cmd:saving(}{help filename}{cmd: [, }{it:suboptions}{cmd: frame append])} creates a Stata data file (.dta file) consisting of (for each statistic in exp_list) a variable containing the replicates.
{pmore}
See {it:{help prefix_saving_option}} for details about {it:suboptions}.
{cmd:simulate2} and {cmd:psimulate2} can save to frames if option {cmd:frame} is used.
It appends an existing dta or frame if option {cmd:append} is used.
{phang}
{opt nolegend} suppresses display of the table legend. The table
legend identifies the rows of the table with the expressions they represent.
{phang}
{opt nocls} {cmd:psimualte2} will refresh the window every time it scans for progress.
{opt nocls} avoids this behaviour and is intended for use with Servers.
{phang}
{opt verbose} requests that the full table legend be displayed. By default,
coefficients and standard errors are not displayed.
{phang}
{opt simulate} requests {cmd:psimulate2} to use {help simulate} rather than {cmd:simulate2}.
If {cmd:psimulate2} is run on Stata 15 then {help simulate} is automatically used.
If option {cmd:simulate} is used each instance is assigned its own seed stream.
{marker optionsSeed}{phang}
{cmd: seed(}{it:options}{cmd:)} controls the random-number seed.
It is possible to set a seed, the random number generator and seed stream or
to load all three from either a frame or saved Stata dataset.
Options are:
{pmore}
{cmd:seed(}{bf:{it:#}}{cmd:)} sets the random-number seed. Specifying this option is
equivalent to typing the following command before calling {opt simulate}:
{pmore2}
{cmd:. set seed} {it:#}
{pmore2}
or
{pmore2}
{cmd:. simulate ... , ... seed(#): ...}
{pmore}
If {help simulate} is used in combination with {cmd:psimulate2},
then only {cmd:seed(#)} can be set. Seed streams are automatically assigned.
{pmore}
{cmd:seed(}{it:integer1 [string integer3]}{cmd:)} sets the {help seed} ({it:integer1}),
the {help rng:random-number generator} ({it:string})
and the {help seedstream} ({it:integer3}).
The default for {it:string} is the default random-number generator and for {it:integer3} seedstream number 1.
{pmore2}
{cmd:. simulate2 ..., seed(123 mt64s 6): ...}
{pmore2}
sets the seed to {it:123}, the {help rng:random-number generator} to {it:mt64s}
and the {help seedstream} to stream number {it:6}. Typing this is equivalent to:
{pmore2}
{cmd:. set rng mt64s}{break}
{cmd:. set seedstream 6}{break}
{cmd:. set seed 123}
{pmore}
{cmd:seed({help frame}|{it:filepath} {it: varname} [{it:varlist}], frame|dta [start(#)])}{break}
uses either a frame or Stata dataset to load the seeds.
The name of the frame is specified with {it:frame}, the path and name of the dataset
with {it:filepath}.
The options {cmd:frame} and {cmd:dta} indicate whether the first argument is
a filepath or frame.
{it:varname} is the variable name containing the seed.
The optional argument {it:varlist} contains the name of the variable containing
the random-number generator and the seedstream.{break}
{cmd:start(#)} allows to start with the #th seed in the frame or dataset.{break}
{cmd:seed(,frame|dta)} requires the same or a higher number of seeds as repetitions set by {cmd:reps(#)}.
{cmd:seed(,frame|dta)} assumes that the data in the frame or datatset is ordered
according to the draws.
This is important if {cmd:simulate2} is applied to a subset of draws and
results are being compared.
Options {cmd:frame} and {cmd:dta} cannot be combined.
{pmore2}
{cmd:. simulate2 , reps(100) seed(seedframe seedvar rngvar streamvar , frame start(10)) : ... }
{pmore2}
Uses seeds saved in frame {it:seedframe}.
The {help rngstate} is taken from variable {it:seedvar},
the {help rng:random number generator} from variable {it:rngvar}
and the number of the {help seedstream} from variable {it:streamvar}.
It starts with the 10th observation in frame {it:seedframe}
for the first draw of the program called by {cmd:simulate2}.
It then continues with observations 11 for draw number 2.
{pmore}
{cmd:seed(}{it:_current}{cmd:)} allows the usage of {cmd:psimulate2} in loops.
It uses the current seed options as a starting seed for {cmd:psimulate2}.
This allows {cmd:psimulate2} to be nested within loops.
See {help simulate2##psimLoop: psimulate2 in loops}.
{phang}
{cmd:seedstream(}{it:integer}{cmd:)} is a convience option for {cmd:psimulate2}.
It sets the inital seedstream number for the first instance.
For example if 3 instances are set ({cmd:parallel(3)}) and
{cmd:seedstream(4)} is used, then instance 1 will use seed stream number 4,
instance 2 stream 5 and instance 3 stream 6.
This function allows the parallel use of {cmd:psimulate2} on multiple
computers with the same starting seed, but different seedstreams.
{phang}{marker SeedSaving}
{cmd:seedsave({it:filename}|{it:frame}), [frame append seednumber(#)]} Saves the seeds from the
beginning of each draw in a dataset defined by {it:filename}.
If option {cmd:frame} is used, it saves the seeds in a frame.
{cmd:append} appends the frame or dataset.
{cmd:seednumber(#)} specifies the first value of variable {it:run}.
If not specified it is set to 1 and in the case of option {cmd:append} it is set
to {it:_N + 1}.
In all cases, the number of the draw, state of the random number generator, the type and
the stream are saved in the following variables:
{synoptset 15}{...}
{synopt:Variablename}Description{p_end}
{synoptline}
{synopt:{it:run}}Number of draw, from 1,2,...,reps(#){p_end}
{synopt:{it:seed}}State of random-number generator (seed){p_end}
{synopt:{it:seedstream}}Number of seedstream{p_end}
{synopt:{it:rng}}Type of random number generator{p_end}
{synoptline}
{p2colreset}{...}
{pmore}
The state of the random number generator is a string with approximately 5,000
characters. Saving 500 seeds requires about 2.4 MB, a restriction
the user has to bear in mind when saving seeds.{p_end}
{phang}
{cmd:parallel(#2)} sets the number of parallel Stata instances.
It is advisable not to use more instances than CPU cores are available.
{phang}
{cmd:parallel(#2, exe(string))} sets the path to the Stata.exe when using {cmd:psimulate2}.
{cmd:psimulate2} will try to find the path, but might fail if Stata.exe
is in a non-conventional folder or has a non-conventional file name.
{phang}
{cmd:parallel(#2, temppath(string))} sets an alternative path to save temporary files.
{cmd:psimulate2} saves several do file and .bat files in the temporary folder
({ccl tmpdir}).
In rare cases Stata might not have read/write rights or
it is not possible to start a .bat file from this folder.
In this case {cmd:temppath()} is required.
{cmd:psimulate2} cleans up the temp folder before using it.
All files starting with {it:psim2_} are removed.
If more than one instance of {cmd:psimulate2} is run in parallel and the same path to save temporary files
is used, then option {cmd:globalid()} is required
to avoid files being overwritten.
See {help psimulate2##ExampleUnix:Examples}.
{phang}
{cmd:parallel(#2, processors(integer))} sets the maximum number of processors
each Stata instance is allowed, see {help set processors}.
This is only relevant for Stata MP.
For example if Stata MP with 4 cores is used and two parallel instance of {cmd:psimulate2},
then the remaining two cores can be used for each instance.
The default is 1, meaning that {cmd:psimulate} only one processor is available to
each Stata instance.
{phang}
{ul:Server specific options}.
{cmd:psimualte2} can be used on Unix servers but some further options might be required.
{phang}
{cmd:docmd(string)} specifies an alternative command to run do files.
For example on a Ubuntu system, {cmd:docmd(stata)} is necessary to start a do file.
{phang}
{cmd:onlydots} instead of the progress window dots are displayed.
The option is intended to minimize the size of log files.
{phang}
Multiple instances of {cmd:psimulate2} can be run on the same machine.
If the same path to save temporary files is used, files may be overwritten.
{cmd:globalid(integer)} specifies the number of the parallel instance to avoid files being overwritten.
{marker SavedValuse}{title:Saved Values}
{pstd}
{cmd:psimulate2} saves the following in {cmd:r()}:
{col 4} Macros
{col 8}{cmd: r(rng_current)}{col 27} The random number generator type of the last run of the last instance.
{col 8}{cmd: r(rngseed_mt64s)}{col 27} The random number generator seed of the last run of the last instance.
{col 8}{cmd: r(rngstate)}{col 27} The random number generator state of the last run of the last instance.
{marker examples}{title:Examples}
{pstd}
Make a dataset containing the OLS coefficient, standard error, the current time
and save the seeds in a frame called {it:seed_frame}.
Perform the experiment 1000 times:
{cmd:program define testsimul, rclass}
{cmd:version {ccl stata_version}}
{cmd:syntax anything}
{cmd:clear}
{cmd:set obs `anything'}
{cmd:gen x = rnormal(1,4)}}
{cmd:gen e = normal()}
{cmd:gen y = 2 + 3*x + e}
{cmd:reg y x}
{cmd:matrix b = e(b)}
{cmd:matrix se = e(V)}
{cmd:ereturn clear}
{cmd:return scalar b = b[1,1]}
{cmd:return scalar V = se[1,1]}
{cmd:return local time "`c(current_time)'"}
{cmd:end}
{phang}
{cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(1000) seedsave(seed_frame,frame): testsimul 100}
{pstd}
Now we can pick up the seeds and re-do the experiment for the first 500 repetitions:
{phang}
{cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(500) seed(seed_frame seeds, frame): testsimul 100}
{pstd}
and for the second 500 repetitions the starting seed is set to seed number 501:
{phang}
{cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(500) seed(seed_frame seeds, frame start(501)): testsimul 100}
{pstd}
Likewise, we can first do 500 draws, save the seeds, do another 500 draws, append the saved seeds and do the
experiment for all 1000 draws. For comparison results are saved in frames:{p_end}
{p 4 4}
{cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(500) seed(123) seedsave(seed_frame, frame) saving(first500, frame): testsimul 100 }{break}
{cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(500) seedsave(seed_frame, frame append) saving(second500, frame): testsimul 100 }{break}
{cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(1000) seed(seed_frame seeds, frame): testsimul 100 }{p_end}
{pstd}
Note that for the second run, no new seed is set and the option {cmd:append} is used. First we
compare the results for the first 500 draws, then for the second 500 draws:{p_end}
{p 4 4}
{cmd:. sum if _n <= 500}{break}
{cmd:. frame first500: sum}{break}
{cmd:. sum if _n > 500}{break}
{cmd:. frame second500: sum}{p_end}
{pstd}
The results of the first two command lines and the second two command lines
are expected to be identical.
{p_end}
{pstd}
Likewise, we can parallelise the simulation from above. For example we want to
run two instances at the same time, i.e. instance 1 runs the first 500, instance
2 the second 500 repetitions:
{p 4 4}
{cmd:. psimulate2 , reps(500) seed(123) parallel(2, temppath("C:\psim2_temp")): testsimul 200}
{p_end}
{pstd}
{cmd:parallel(2, temppath("C:\LocalStore\jd71\temp"))} sets two parallel instances.
{cmd:temppath()} specifies the path temporary files are saved to.
This option is only necessary if Windows does not allow to run batch files from
the temp folder. The runs 1 and 500 will have the same random-number state, but
a different seed stream and therefore random draws will differ.{p_end}
{pstd}
In the case one or more {cmd:psimulate2} are nested in a loop or called sequentially,
they would use the same initial seed.
To avoid this, {cmd:psimulate2} returns the last seed state of the last instance
in {cmd:r()}.
To run {cmd:psimulate} sequentially we code:
{p 4 4}
{cmd:. psimulate2 , reps(100) seed(_current) p(2) seedsave(seed, frame): testsimul 200}{break}
{cmd:. set rng `r(rng_current)'}{break}
{cmd:. set rngstate `r(rngstate)'}{break}
{cmd:. psimulate2 , reps(100) seed(_current) p(2) seedsave(seed, frame): testsimul 200}
{p_end}
{pstd}
Using {it:_current} within {cmd:seed()} {cmd:psimulate2} will use the
current seed of the parent instance as an initial seed for all child instances.
Each child instance will still have a different seed stream to ensure the random
number draws are different.{p_end}
{marker ExampleUnix}
{pstd}
{ul:Example for Unix Servers - efficient use of {cmd:psimulate2}}
{pstd}
Let's assume we want to run the example above, but we want to run the simulation with 20, 50 100 and 1000 observations.
We also want to compare results if errors are standard normal and uniform distributed.
Option {cmd:uniform} is added to the {cmd:testsimul} program:
{cmd:program define testsimul, rclass}
{cmd:version {ccl stata_version}}
{cmd:syntax anything, [uniform]}
{cmd:clear}
{cmd:set obs `anything'}
{cmd:gen x = rnormal(1,4)}
{cmd:if "`uniform'"== "" gen e = normal()}
{cmd:else gen e = runiform(-1,1)}
{cmd:gen y = 2 + 3*x + e}
{cmd:reg y x}
{cmd:matrix b = e(b)}
{cmd:matrix se = e(V)}
{cmd:ereturn clear}
{cmd:return scalar b = b[1,1]}
{cmd:return scalar V = se[1,1]}
{cmd:return local time "`c(current_time)'"}
{cmd:end}
{pstd}
We can use a - say Ubuntu - Unix server with a total of 20 cores and we want to use all of them.
Stata is installed on the machine and can be started from the command line with the command {it:stata}.
A folder to store temporary files is created in the home directory and called {it:tmp}.
{pstd}
To run the simulations, we write two do files, one for the DGP with standard normal errors, one for the DGP with uniform errors.
The do files are called {it:spec1.do} and {it:spec2.do}.
Both do files include a loop over the set of number of observations and save the simulated results.
The simulation is repeated 1000 times with 9 parallel instances:
{ul:spec1.do}
{cmd:clear}
{cmd:set seed 12345}
{cmd:foreach N in 20 50 100 1000 {c -(} }
{cmd: psimulate2 , reps(1000) parallel(9, temppath("/home/tmp")) seed(_current) onlydots globalid(1) docmd(stata): testsimul `N' }
{cmd: save res_`N'_spec1}
{cmd: {c )-} }
{pstd}
To run the second specification, we add the option {cmd:uniform} and alter the filename of the dataset with the Monte Carlo results:
{ul:spec2.do}
{cmd:clear}
{cmd:set seed 678910}
{cmd:foreach N in 20 50 100 1000 {c -(} }
{cmd: psimulate2 , reps(1000) parallel(9, temppath("/home/tmp")) seed(_current) onlydots globalid(2) docmd(stata): testsimul `N' , uniform}
{cmd: save res_`N'_spec2}
{cmd: {c )-} }
{pstd}
The {it:spec1.do} and {it:spec2.do} do files can run on the server at the same time and we make optimal use of the 20 cores.
Depending on the server, it might be necessary to write a batch file which allocates memory, number of cores and run time for each do file.
After the simulations are run, there should be a total of 8 files, 4 for each specification.
{pstd}
Next we discuss the options in detail:
{phang}
{cmd:parallel(9, temppath("/home/tmp"))} sets the number of parallel instances to 9.
Each do file then uses in total 10 cores.
The option {cmd:temppath()} specifies the path to save temporary files which needs to be set to be read and writeable.
{phang}
{cmd:seed(_current)} ensures that the seed is altered for the different values, see {help simulate2##psimLoop: psimulate2 in loops}.
Otherwise the first 20 observations if N=50 would be the same as the observations of the case N=20.
{phang}
{cmd:onlydots} helps to reduce the size of the log files. Instead of an overview of the progress, only dots are displayed.
{phang}
{cmd:globalid()} sets the id for each of the instances of {cmd:psimualte2}.
All temporary files are named {it:psim2#Instance_} and thus it is ensured that no temporary file is overwritten.
Alternatively we could specify a temporary path (eg: "/home/tmp/tmp_1") for each do file/specification.
{phang}
{cmd:docmd(stata)} specifies the command a do file is started with from the command line/within Stata.
On our Ubuntu server, new do files are started with {it:stata "home/dofiles/dofile.do"} and we need to specify the command to do so.
{marker knownproblems}{title:Known Problems and Issues}
{p 8 8} - On some Windows installations or servers the default temporary folder is locked or not accessible. In this
case Stata and psimulate2 cannot write any files in the temporary folder.
Option {cmd:temppath()} can be used to set an alternative temporary folder.{p_end}
{p 8 8} - psimulate2 can crash if the temporary path or any other path it writes on
is in a cloud storage folder from services such as Dropbox, OneDrive or Backup and Sync from
Google. A fix is to pause those services.{p_end}
{p 8 8} - psimulate2 has problems with long names, such as variable names or
command names. In such cases it tends to shorten the names which might cause interruptions in the code.
The best solution is to shorten the names.{p_end}
{p 8 8} - Unix servers require the {cmd:nocls} function to run {cmd:psimulate2}.{p_end}
{p 8 8} - The {cmd:onlydots} options should be used with servers tpo avoid large log files.{p_end}
{marker install}{title:How to install}
{p 4 4}{cmd:simulate2} and {cmd:psimulate2} can be directly installed from GitHub:{p_end}
{col 8}{stata "net install simulate2, from(https://janditzen.github.io/simulate2/)"}
{marker about}{title:Author}
{p 4}Jan Ditzen (Free University of Bozen-Bolzano){p_end}
{p 4}Email: {browse "mailto:jan.ditzen@unibz.it":jan.ditzen@unibz.it}{p_end}
{p 4}Web: {browse "www.jan.ditzen.net":www.jan.ditzen.net}{p_end}
{p 4 4}{opt simulate2} was inspired by comments from Alan Riley
and Tim Morris at the Stata User group meeting 2019 in London.
Parts of the program code and help file were taken from {help simulate}.
Kit Baum initiated the integration of MacOS and Unix and assisted in the implementation.
Michael Porst and
Gabriel Chodorow-Reich provided much valued feedback.
I am grateful for all of their help.
All remaining errors are my own.
I do not take over responsibility for any computer crashes, lost work or financial losses following the use of {cmd:(p)simulate2}.{p_end}
{title:Change Log}
{p 4}Version 1.08 to 1.09{p_end}
{p 8 8}- support for Stata BE added{p_end}
{p 4}Version 1.07 to 1.08{p_end}
{p 8 8}- added options {cmd:onlydots}, {cmd:docmd()} and {cmd:globalid} to improve support for servers{p_end}
{p 4}Version 1.06 to 1.07{p_end}
{p 8 8}- fix in program lines with more than 250 characters (thanks to Gabriel Chodorow-Reich){p_end}
{p 4}Version 1.05 to 1.06{p_end}
{p 8 8}- various small bug fixes (thanks to Gabriel Chodorow-Reich){p_end}
{p 4}Version 1.04 to 1.05{p_end}
{p 8 8}- bug fix in exepath{p_end}
{p 4}Version 1.03 to 1.04{p_end}
{p 8 8}- bug fixed if data appended to frame but frame does not exists{p_end}
{p 8 8}- fix in temppath local{p_end}
{p 4}Version 1.02 to 1.03{p_end}
{p 8 8}- improved behaviour for long lines in do files or programs{p_end}
{p 8 8}- warning message if no seed set{p_end}
{p 4}Version 1.01 to Version 1.02{p_end}
{p 8 8}- Mata matrices and scalars are moved from parent to child instance as well.{p_end}
{p 4}Version 1.0 to Version 1.01{p_end}
{p 8 8}- bug fixes in program to get exe name{p_end}
{p 8 8}- no batch file written anymore; support for MacOS{p_end}
{p 8 8}- added options nocls and processors to set max processors for Stata MP{p_end}
{p 8 8}- added that mata defined function are moved.{p_end}
{title:Also see}
{p 4} See also: {help simulate}, {help multishell}{p_end}