diff --git a/README.md b/README.md index 4539b44..ac6efc6 100644 --- a/README.md +++ b/README.md @@ -39,7 +39,13 @@ aving(filename, ...) | save results to filename seed(options) | control of seed, see seed options seedsave(options) | saves the used seeds, see saving seeds seedstream(integer) | starting seedstream, only psimulate2 - +nocls | do not refresh window (only {cmd:psimulate2}) +onlydots |display dots rather than output window. Recommended for server use. +docmd(string) | Alterantive command to call do files. +globalid(string_ | Sets id for simulation run. Necessary if multiple instances of psimulate2 are run on the same machine. +{synoptline} + + options3 | Description --- | --- exe(string) | sets the path to the Stata.exe @@ -107,7 +113,13 @@ seed(options) | controls the random-number seed. It is possible to set a seed, seed(#) | sets the random-number seed. If simulate is used in combination with psimulate2, then only seed(#) can be set. Seed streams are automatically assigned. seed(integer1 [string integer3]) sets the seed (integer1), the random-number generator (string) and the seedstream (integer3). The default for string is the default random-number generator and for integer3 seedstream number 1. seedstream(integer) | is a convience option for psimulate2. It sets the inital seedstream number for the first instance. For example if 3 instances are set (parallel(3)) and seedstream(4) is used, then instance 1 will use seed stream number 4, instance 2 stream 5 and instance 3 stream 6. This function allows the parallel use of psimulate2 on multiple computers with the same starting seed, but different seedstreams. seedsave(filename\frame), [frame append seednumber(#)] | Saves the seeds from the beginning of each draw in a dataset defined by filename. If option frame is used, it saves the seeds in a frame. append appends the frame or dataset. seednumber(#) specifies the first value of variable run. If not specified it is set to 1 and in the case of option append it is set to _N + 1. In all cases, the number of the draw, state of the random number generator, the type and the stream are saved. -parallel(#2) | sets the number of parallel Stata instances. It is advisable not to use more instances than CPU cores are available. parallel(#2, exe(string)) sets the path to the Stata.exe when using psimulate2. psimulate2 will try to find the path, but might fail if Stata.exe is in a non-conventional folder or has a non-conventional file name. parallel(#2, temppath(string)) sets an alternative path to save temporary files. psimulate2 saves several do file and .bat files in the temporary folder (C:\Users\jan\AppData\Local\Temp/). In rare cases Stata might not have read/write rights or it is not possible to start a .bat file from this folder. In this case temppath() is required. psimulate2 cleans up the temp folder before using it. All files starting with psim2_ are removed. parallel(#2, processors(integer)) sets the maximum number of processors each Stata instance is allowed, see set processors. This is only relevant for Stata MP. For example if Stata MP with 4 cores is used and two parallel instance of psimulate2, then the remaining two cores can be used for each instance. The default is 1, meaning that psimulate only one processor is available to each Stata instance. +parallel(#2) | sets the number of parallel Stata instances. It is advisable not to use more instances than CPU cores are available. parallel(#2, exe(string)) sets the path to the Stata.exe when using psimulate2. psimulate2 will try to find the path, but might fail if Stata.exe is in a non-conventional folder or has a non-conventional file name. parallel(#2, temppath(string)) sets an alternative path to save temporary files. psimulate2 saves several do file and .bat files in the temporary folder. In rare cases Stata might not have read/write rights or it is not possible to start a .bat file from this folder. In this case temppath() is required. psimulate2 cleans up the temp folder before using it. All files starting with psim2_ are removed. parallel(#2, processors(integer)) sets the maximum number of processors each Stata instance is allowed, see set processors. This is only relevant for Stata MP. For example if Stata MP with 4 cores is used and two parallel instance of psimulate2, then the remaining two cores can be used for each instance. The default is 1, meaning that psimulate only one processor is available to each Stata instance. +docmd(string) | specifies an alternative command to run do files. For example on a Ubuntu system, {cmd:docmd(stata)} is necessary to start a do file. +onlydots | instead of the progress window dots are displayed. The option is intended to minimize the size of log files. +globalid(integer) | Multiple instances of psimulate2 can be run on the same machine. If the same path to save temporary files is used, files may be overwritten. globalid(integer) specifies the number of the parallel instance to avoid files being overwritten. + + + # 4. Examples @@ -182,12 +194,82 @@ psimulate2 , reps(100) seed(_current) p(2) seedsave(seed, frame): testsimul 200 Using `current` within seed() psimulate2 will use the `current` seed of the parent instance as an initial seed for all child instances. Each child instance will still have a different seed stream to ensure the random number draws are different. -# 5. Problems -- On some Windows installations the temporary folder is locked. In this case Stata and psimulate2 cannot write any files in the temporary folder. Option temppath() can be used to set an alternative temporary folder. +## 4.1 Example Unix Server + +Let's assume we want to run the example above, but we want to run the simulation with 20, 50 100 and 1000 observations. +We also want to compare results if errors are standard normal and uniform distributed. +Option `uniform` is added to the testsimul program: + +``` +program define testsimul, rclass + version 18 + syntax anything, [uniform] + clear + set obs `anything' + gen x = rnormal(1,4) + if "`uniform'"== "" gen e = normal() + else gen e = runiform(-1,1) + gen y = 2 + 3*x + e + reg y x + matrix b = e(b) + matrix se = e(V) + ereturn clear + return scalar b = b[1,1] + return scalar V = se[1,1] + return local time "`c(current_time)'" +end +``` + +We can use a - say Ubuntu - Unix server with a total of 20 cores and we want to use all of them. Stata is installed on the machine and can be started from the command line with the command **stata**. A folder to store temporary files is created in the home directory and called `tmp`. + +To run the simulations, we write two do files, one for the DGP with standard normal errors, one for the DGP with uniform errors. The do files are called `spec1.do` and `spec2.do`. +Both do files include a loop over the set of number of observations and save the simulated results. +The simulation is repeated 1000 times with 9 parallel instances: + +**spec1.do**: +``` + clear + set seed 12345 + foreach N in 20 50 100 1000 { + psimulate2 , reps(1000) parallel(9, temppath("/home/tmp")) seed(_current) onlydots globalid(1) docmd(stata): testsimul `N' + save res_`N'_spec1 + } +``` -- psimulate2 can crash if the temporary path or any other path it writes on is in a cloud storage folder from services such as Dropbox, OneDrive or Backup and Sync from Google. A fix is to pause those services. +To run the second specification, we add the option `uniform` and alter the filename of the dataset with the Monte Carlo results. + +**spec2.do**: +``` + clear + set seed 678910} + foreach N in 20 50 100 1000 } + psimulate2 , reps(1000) parallel(9, temppath("/home/tmp")) seed(_current) onlydots globalid(2) docmd(stata): testsimul `N' , uniform} + save res_`N'_spec2 +``` + +The ``spec1.do`` and ``spec2.do`` do files can run on the server at the same time and we make optimal use of the 20 cores. +Depending on the server, it might be necessary to write a batch file which allocates memory, number of cores and run time for each do file. After the simulations are run, there should be a total of 8 files, 4 for each specification. + + +Next we discuss the options in detail: + + +Option | Description + --- | --- +parallel(9, temppath("/home/tmp")) | sets the number of parallel instances to 9. Each do file then uses in total 10 cores. +The option temppath()} specifies the path to save temporary files which needs to be set to be read and writeable. +seed(_current) | ensures that the seed is altered for the different values. Otherwise the first 20 observations if N=50 would be the same as the observations of the case N=20. +onlydots | helps to reduce the size of the log files. Instead of an overview of the progress, only dots are displayed. +globalid() | sets the id for each of the instances of **psimualte2**. All temporary files are named `psim2#Instance_` and thus it is ensured that no temporary file is overwritten. Alternatively we could specify a temporary path (eg: "/home/tmp/tmp_1") for each do file/specification. +docmd(stata) | specifies the command a do file is started with from the command line/within Stata. On our Ubuntu server, new do files are started with `"home/dofiles/dofile.do"` and we need to specify the command to do so. +# 5. Problems + - On some Windows installations or servers the default temporary folder is locked or not accessible. In this case Stata and psimulate2 cannot write any files in the temporary folder. Option temppath() can be used to set an alternative temporary folder. + - psimulate2 can crash if the temporary path or any other path it writes on is in a cloud storage folder from services such as Dropbox, OneDrive or Backup and Sync from Google. A fix is to pause those services. +- psimulate2 has problems with long names, such as variable names or command names. In such cases it tends to shorten the names which might cause interruptions in the code. The best solution is to shorten the names. +- Unix servers require the nocls function to run psimulate2. +- The onlydots options should be used with servers tpo avoid large log files. # 6. How to install @@ -207,9 +289,16 @@ Email: jan.ditzen@unibz.it Web: www.jan.ditzen.net -simulate2 was inspired by comments from Alan Riley and Tim Morris at the Stata User group meeting 2019 in London. Parts of the program code and help file were taken from simulate. Kit Baum initiated the integration of MacOS and Unix and assisted in the implementation. I am grateful for his help. All remaining errors are my own. +simulate2 was inspired by comments from Alan Riley and Tim Morris at the Stata User group meeting 2019 in London. Parts of the program code and help file were taken from simulate. Kit Baum initiated the integration of MacOS and Unix and assisted in the implementation. Michael Porst and Gabriel Chodorow-Reich provided much valued feedback. I am grateful for all of their help. All remaining errors are my own. I do not take over responsibility for any computer crashes, lost work or financial losses following the use of *(p)simulate2*. ## Change Log + +Version 1.07 to 1.08 +- added options onlydots, docmd() and globalid to improve support for servers + +Version 1.06 to 1.07 +- fix in program lines with more than 250 characters (thanks to Gabriel Chodorow-Reich) + Version 1.05 to 1.06 - fix in program lines with more than 250 characters (thanks to Gabriel Chodorow-Reich) diff --git a/psimulate2.ado b/psimulate2.ado index 9ad3750..fe902d7 100644 --- a/psimulate2.ado +++ b/psimulate2.ado @@ -1,5 +1,5 @@ *! parallelise simulate2 -*! Version 1.07 - 24.01.2022 +*! Version 1.08 - 15.08.2023 *! by Jan Ditzen - www.jan.ditzen.net /* changelog To version 1.01 @@ -23,6 +23,9 @@ To version 1.06 - 21.01.2022 - bug fix in temppath To version 1.07 - 24.01.2022 - bug fix if program has more than 250 characters +To version 1.08 + - 18.07.2023 - added option onlydots: display only dots for parallel simulate. better for servers + - 31.07.2023 - added options docmd() and globalid() */ @@ -49,6 +52,9 @@ program psimulate2 , rclass NOCls /// do not refresh windows simulate /// use simulate rather than simulate2 seedstream(integer 0) /// first seedstream + ONLYDOTS /// show only dots + docmd(string) /// alternative command line to call do files + globalid(string) /// set id if psimulate run in parallel; default is empty ] local 0 `parallel' @@ -70,13 +76,14 @@ program psimulate2 , rclass local temppath `"`c(tmpdir)'"' } - if "`nocls'" == "" { + if "`nocls'`onlydots'" == "" { local cls cls } if "`c(mode)'" == "batch" { ** If run in batch mode, do not use cls local cls "" } + if "`docmd'" == "" local docmd "do" *** which simulate is used if "`simulate'" == "" { ** make sure even if simulate not used but version < 16, use simulate @@ -94,12 +101,15 @@ program psimulate2 , rclass local whichsim "simulate" } + global psim2_id `globalid' + local psim2_id $psim2_id + *** remove p2sim files from folder - local files: dir `"`temppath'"' files "psim2_*" + local files: dir `"`temppath'"' files "psim2`psim2_id'_*" foreach file in `files' { erase "`temppath'/`file'" } - cap erase "`temppath'/lpsim2_matafunc.mlib" + cap erase "`temppath'/lpsim2`psim2_id'_matafunc.mlib" *** Get exe path if "`exe'" == "" { psim2_getExePath @@ -111,7 +121,7 @@ program psimulate2 , rclass *** Copy mata matrices local matamatsave = 0 - cap erase "`temppath'/psim2_matamat.mmat" + cap erase "`temppath'/psim2`psim2_id'_matamat.mmat" *** Process seed options *** 1. neither frame nor dta used, then seed(seed rng seedstream) @@ -167,8 +177,8 @@ program psimulate2 , rclass else { tokenize `anything' if "`frame'" != "" { - frame `1': save "`temppath'/psim2_seed", replace - local seed `"`temppath'/psim2_seed `2' `3' `4' , dta"' + frame `1': save "`temppath'/psim2`psim2_id'_seed", replace + local seed `"`temppath'/psim2`psim2_id'_seed `2' `3' `4' , dta"' local seedstartop `"start(\`=\`repscum'+\`startseednum'-1')"' } else if "`dta'" != "" { @@ -187,7 +197,7 @@ program psimulate2 , rclass local sseedframe "`frame'" local sseedappend "`append'" - local seedsave `"`temppath'/psim2_seedsave_\`inst' , seednumber(\`repscum')"' + local seedsave `"`temppath'/psim2`psim2_id'_seedsave_\`inst' , seednumber(\`repscum')"' } } else { @@ -200,7 +210,7 @@ program psimulate2 , rclass } *** save dta - save "`temppath'/psim2_start", replace emptyok + save "`temppath'/psim2`psim2_id'_start", replace emptyok *** Number of replications for each local repsavg = floor(`reps'/`instance') @@ -221,22 +231,22 @@ program psimulate2 , rclass if "`whichsim'" == "simulate2" { psim2_WriteDofile `exp_list' , /// /// sim2 options - saving("`temppath'/psim2_results_`inst'", replace) reps(`repsi') /// + saving("`temppath'/psim2`psim2_id'_results_`inst'", replace) reps(`repsi') /// perindicator(100, perindicpath(`"`temppath'"') performid(`inst')) /// seed("`seed'" "`seedstartop'") seedsave("`seedsave'") seedstream(`seedstream') /// /// writeBatch options id(`inst') processors(`processors') /// - startdta(psim2_start) temppath(`temppath') /// + startdta(psim2`psim2_id'_start) temppath(`temppath') /// : `after' } else { psim2_WriteDofile `exp_list' , /// /// sim options - saving("`temppath'/psim2_results_`inst'", replace) reps(`repsi') /// + saving("`temppath'/psim2`psim2_id'_results_`inst'", replace) reps(`repsi') /// `seed' /// /// writeBatch options id(`inst') processors(`processors') simulate /// - startdta(psim2_start) temppath(`temppath') /// + startdta(psim2`psim2_id'_start) temppath(`temppath') /// : `after' } local repscum = `repscum'+`repsi' @@ -259,12 +269,12 @@ program psimulate2 , rclass forvalues inst = 1(1)`instance' { * noi disp `"command line to execute: winexec `exepath' `winexec_e' do "`temppath'psim2_DoFile_`inst'.do" "' * local lastcmd `"winexec `exepath' `winexec_e' do "`temppath'psim2_DoFile_`inst'.do" "' - winexec `exepath' `winexec_e' do "`temppath'/psim2_DoFile_`inst'.do" + winexec `exepath' `winexec_e' `docmd' "`temppath'/psim2`psim2_id'_DoFile_`inst'.do" } } else { forvalues inst = 1(1)`instance' { - local line "`line' (`exepath' do "`temppath'/psim2_DoFile_`inst'.do" &)" + local line "`line' (`exepath' `docmd' "`temppath'/psim2`psim2_id'_DoFile_`inst'.do" &)" if `inst' != `instance' { local line "`line' ; " } @@ -286,13 +296,13 @@ program psimulate2 , rclass sleep `sleeptime' forvalues inst = 1(1)`instance' { - if fileexists("`temppath'/psim2_performance_`inst'.mmat") == 1 { - cap qui mata mata matuse "`temppath'/psim2_performance_`inst'", replace + if fileexists("`temppath'/psim2`psim2_id'_performance_`inst'.mmat") == 1 { + cap qui mata mata matuse "`temppath'/psim2`psim2_id'_performance_`inst'", replace if _rc != 0 { ** build in artifical sleep - noi disp "error in saving psim2_performance_`inst'" + noi disp "error in saving psim2`psim2_id'_performance_`inst'" sleep 1000 - cap qui mata mata matuse "`temppath'/psim2_performance_`inst'", replace + cap qui mata mata matuse "`temppath'/psim2`psim2_id'_performance_`inst'", replace } qui mata st_local("done_`inst'",strofreal(p2sim_performance[1,1])) qui mata st_local("reps_`inst'",strofreal(p2sim_performance[1,2])) @@ -327,29 +337,33 @@ program psimulate2 , rclass local exp_finish_time = `nowtime'+`exp_time_left' `cls' - noi disp as text "" - noi disp "psimulate2 - parallelise `whichsim'" - noi disp as text "" - local aftertt = strtrim(`"`after'"'') - noi disp as text `"command: `aftertt' "' - noi disp as text "" - noi disp as text "Timings (hour, minute, sec):" _col(40) "Estimated:" - noi disp as text " Average Run: " _col(24) %tcHH:MM:SS.sss `avg_run' _col(40) " Time left (min):" _col(60) %tcHH:MM:SS `exp_time_left' - noi disp as text " Time Elapsed:" _col(24) %tcHH:MM:SS `time_elapsed' _col(40) " finishing time:" _col(60) %tcHH:MM:SS `exp_finish_time' - noi disp "" - if "`whichsim'" == "simulate2" { - forvalues inst = 1(1)`instance' { - noi disp as text "Instance `inst':" - noi disp as text " Done " %9.2f `=`done_`inst''/`reps_`inst''*100' "%" _col(20) "(`done_`inst''/`reps_`inst'')" + if "`onlydots'" == "" { + noi disp as text "" + noi disp "psimulate2 - parallelise `whichsim'" + noi disp as text "" + local aftertt = strtrim(`"`after'"'') + noi disp as text `"command: `aftertt' "' + noi disp as text "" + noi disp as text "Timings (hour, minute, sec):" _col(40) "Estimated:" + noi disp as text " Average Run: " _col(24) %tcHH:MM:SS.sss `avg_run' _col(40) " Time left (min):" _col(60) %tcHH:MM:SS `exp_time_left' + noi disp as text " Time Elapsed:" _col(24) %tcHH:MM:SS `time_elapsed' _col(40) " finishing time:" _col(60) %tcHH:MM:SS `exp_finish_time' + noi disp "" + if "`whichsim'" == "simulate2" { + forvalues inst = 1(1)`instance' { + noi disp as text "Instance `inst':" + noi disp as text " Done " %9.2f `=`done_`inst''/`reps_`inst''*100' "%" _col(20) "(`done_`inst''/`reps_`inst'')" + } + noi disp as text "Total" + noi disp as text " Done " %9.2f `=`reps_done'/`reps'*100' "%" _col(20) "(`reps_done'/`reps')" } - noi disp as text "Total" - noi disp as text " Done " %9.2f `=`reps_done'/`reps'*100' "%" _col(20) "(`reps_done'/`reps')" + else { + noi disp as text " simulate does not allow for process indication. Please wait." + } + local sleeptime = `avg_run' * `reps' / 100 } else { - noi disp as text " simulate does not allow for process indication. Please wait." + noi disp "." , _c } - local sleeptime = `avg_run' * `reps' / 100 - *** wait at least 0.25 sec if `sleeptime' < 250 { local sleeptime = 250 @@ -357,11 +371,13 @@ program psimulate2 , rclass else if `sleeptime' > 60000 { local sleeptime = 59999 } - noi disp "" - noi disp as text "Current Time: `c(current_time)' - next refresh in " %tcSS.ss `sleeptime' " sec." - if `seednote' == 1 { - noi disp as text "No seed set. If psimulate is used in a loop, " - noi disp as text "all iterations of the loop will have the Stata default seed." + if "`onlydots'" == "" { + noi disp "" + noi disp as text "Current Time: `c(current_time)' - next refresh in " %tcSS.ss `sleeptime' " sec." + if `seednote' == 1 { + noi disp as text "No seed set. If psimulate is used in a loop, " + noi disp as text "all iterations of the loop will have the Stata default seed." + } } sleep `sleeptime' } @@ -369,7 +385,7 @@ program psimulate2 , rclass noi disp "Click on link to open log file: " di as text "Log files " forvalues inst = 1(1)`instance' { - if fileexists("psim2_DoFile_`inst'.log") == 1 { + if fileexists("psim2`psim2_id'_DoFile_`inst'.log") == 1 { disp as smcl _col(5) "Instance `inst':" _col(20) `"{view psim2_DoFile_`inst'.log: Log File}"' } } @@ -379,9 +395,9 @@ program psimulate2 , rclass if "`sseeddest'" != "" { clear - use "`temppath'/psim2_seedsave_1" + use "`temppath'/psim2`psim2_id'_seedsave_1" forvalues inst = 2(1)`instance' { - append using "`temppath'/psim2_seedsave_`inst'", force + append using "`temppath'/psim2`psim2_id'_seedsave_`inst'", force } if "`sseedframe'" == "" { @@ -392,8 +408,8 @@ program psimulate2 , rclass } else { if "`sseedappend'" != "" { - frame `sseeddest': save "`temppath'/psim2_oldseed", replace - append using "`temppath'/psim2_oldseed" + frame `sseeddest': save "`temppath'/psim2`psim2_id'_oldseed", replace + append using "`temppath'/psim2`psim2_id'_oldseed" } cap frame drop `sseeddest' frame copy `c(frame)' `sseeddest' @@ -402,9 +418,9 @@ program psimulate2 , rclass } *** Collect data clear - use "`temppath'/psim2_results_1" + use "`temppath'/psim2`psim2_id'_results_1" forvalues inst = 2(1)`instance' { - qui append using "`temppath'/psim2_results_`inst'", force + qui append using "`temppath'/psim2`psim2_id'_results_`inst'", force } if "`saving'" != "" { @@ -418,8 +434,8 @@ program psimulate2 , rclass if regexm("`r(frames)'","`anything'") == 0 { frame create `anything' } - frame `anything': save "`temppath'/psim2_oldframe", replace emptyok - append using "`temppath'/psim2_oldframe" + frame `anything': save "`temppath'/psim2`psim2_id'_oldframe", replace emptyok + append using "`temppath'/psim2`psim2_id'_oldframe" } cap frame drop `anything' frame copy `c(frame)' `anything' @@ -453,12 +469,14 @@ program define psim2_WriteDofile local after `"`s(after)'"' local 0 `"`s(before)'"' + local psim2_id $psim2_id + syntax [anything(name=explist equalok)], id(string) startdta(string) temppath(string) processors(integer) seedstream(integer) [simulate] * local sim2_options `options' local path `"`temppath'"' - local doFileName "psim2_DoFile_`id'.do" + local doFileName "psim2`psim2_id'_DoFile_`id'.do" tempname dofile @@ -475,7 +493,7 @@ program define psim2_WriteDofile file write `dofile' `"disp "File was written `c(current_time)'""' _n if "`pnames'" != "" { - file write `dofile' `"include `"`temppath'/psim2_programs.do"'"' _n + file write `dofile' `"include `"`temppath'/psim2`psim2_id'_programs.do"'"' _n } **** If Stata MP, use only one core: @@ -509,19 +527,20 @@ program define psim2_WriteDofile **** set new ado path to library in do file mata mata memory if `r(Nf_def)' > 0 { - cap lmbuild lpsim2_matafunc , dir(`temppath') replace + cap lmbuild lpsim2`psim2_id'_matafunc , dir(`temppath') replace if _rc != 0 { sleep 200 - cap lmbuild lpsim2_matafunc , dir(`temppath') replace + cap lmbuild lpsim2`psim2_id'_matafunc , dir(`temppath') replace } file write `dofile' `"adopath + "`temppath'""' _n + file write `dofile' `"mata mata mlib index"' _n } **** Mata programs mata mata memory if `r(Nm)' > 0 { - mata mata matsave "`temppath'/psim2_matamat.mmat" *, replace - file write `dofile' `"mata mata matuse "`temppath'/psim2_matamat.mmat", replace"' _n + mata mata matsave "`temppath'/psim2`psim2_id'_matamat.mmat" *, replace + file write `dofile' `"mata mata matuse "`temppath'/psim2`psim2_id'_matamat.mmat", replace"' _n } /* @@ -541,12 +560,13 @@ program define psim2_WriteDofile macro shift } */ + **** Open Dataset file write `dofile' `"use "`temppath'/`startdta'""' _n **** Do cmd if "`simulate'" == "" { - file write `dofile' `"simulate2 `explist' , inpsim2 `options' : `after'"' _n + file write `dofile' `"simulate2 `explist' , inpsim2 psim2_id(`psim2_id') `options' : `after'"' _n } else { file write `dofile' `"set rng mt64s"' _n @@ -555,7 +575,7 @@ program define psim2_WriteDofile file write `dofile' `"mata p2sim_performance = -999, -999 "' _n file write `dofile' `"mata p2sim_lastrng = "\`c(rng_current)'""' _n file write `dofile' `"mata p2sim_lastseed = "\`c(rngstate)'" , "\`c(rngseed_mt64s)'""' _n - file write `dofile' `"mata mata matsave "`temppath'/psim2_performance_`id'" p2sim_performance p2sim_lastseed p2sim_lastrng , replace "' _n + file write `dofile' `"cap mata mata matsave "`temppath'/psim2`psim2_id'_performance_`id'" p2sim_performance p2sim_lastseed p2sim_lastrng , replace "' _n } **** Close do file file close `dofile' @@ -573,7 +593,7 @@ program define psim2_getExePath, rclass else { if "`c(os)'" == "Windows" { if `c(SE)' == 1 & `c(MP)' == 0 { - local type SE + dlocal type SE } else if `c(MP)' == 1 { local type MP @@ -622,18 +642,20 @@ end program define psim2_programlist, rclass syntax [anything] , temppath(string) - + + local psim2_id $psim2_id + log local logname "`r(filename)'" cap log close local linesize `c(linesize)' set linesize 250 - log using "`temppath'/psim2_plog", replace text nomsg + log using "`temppath'/psim2`psim2_id'_plog", replace text nomsg program dir log close set linesize `linesize' tempname file nextline - file open `file' using `"`temppath'/psim2_plog.log"' , read + file open `file' using `"`temppath'/psim2`psim2_id'_plog.log"' , read file read `file' line while r(eof)==0 { local line `" `macval(line)'"' @@ -654,22 +676,22 @@ program define psim2_programlist, rclass if "`pnames'" != "" { *local appendreplace "replace" tempname dofilenew - file open `dofilenew' using "`temppath'/psim2_programs.do" , write text replace + file open `dofilenew' using "`temppath'/psim2`psim2_id'_programs.do" , write text replace foreach prog in `pnames' { local linesize `c(linesize)' set linesize 250 - log using "`temppath'/psim2_tmp_program.log" , text nomsg replace + log using "`temppath'/psim2`psim2_id'_tmp_program.log" , text nomsg replace cap noi program list `prog' log close set linesize `linesize' if _rc == 0 { - file open `file' using `"`temppath'/psim2_tmp_program.log"' , read + file open `file' using `"`temppath'/psim2`psim2_id'_tmp_program.log"' , read file read `file' line /// Open it the second time and shift one line down to check if line was cut off - file open `nextline' using `"`temppath'/psim2_tmp_program.log"' , read + file open `nextline' using `"`temppath'/psim2`psim2_id'_tmp_program.log"' , read file read `nextline' next file read `nextline' next diff --git a/simulate2.ado b/simulate2.ado index d21b6b1..8158cd0 100644 --- a/simulate2.ado +++ b/simulate2.ado @@ -45,6 +45,7 @@ program simulate2 SEEDSave(string) /// seed save option perindicator(string) /// save every x-th file as indicator for progress inpsim2 /// indicator for psim2 + psim2_id(string) /// id when using psimulate2 /// rest * /// ] @@ -475,7 +476,7 @@ program simulate2 *** run first performance check if "1" == "`perfomnumsi'" { qui mata p2sim_performance = 1 , `reps' - qui mata mata matsave "`perindicpath'/psim2_performance_`performid'" p2sim_performance , replace + qui mata mata matsave "`perindicpath'/psim2`psim2_id'_performance_`performid'" p2sim_performance , replace local perfomnumsi = `perfomnumsi' + `perfomnums' } @@ -566,10 +567,10 @@ program simulate2 if `"`perindicator'"' != "" { if `i' == `perfomnumsi' { qui mata p2sim_performance = `i', `reps' - cap qui mata mata matsave `"`perindicpath'/psim2_performance_`performid'"' p2sim_performance , replace + cap qui mata mata matsave `"`perindicpath'/psim2`psim2_id'_performance_`performid'"' p2sim_performance , replace if _rc != 0 { sleep 50 - cap qui mata mata matsave `"`perindicpath'/psim2_performance_`performid'"' p2sim_performance , replace + cap qui mata mata matsave `"`perindicpath'/psim2`psim2_id'_performance_`performid'"' p2sim_performance , replace } local perfomnumsi = `perfomnumsi' + `perfomnums' } @@ -650,11 +651,11 @@ program simulate2 qui mata p2sim_lastseed = "`c(rngstate)'" , "`c(rngseed_mt64s)'" qui mata p2sim_lastrng = "`c(rng_current)'" - cap qui mata mata matsave "`perindicpath'/psim2_performance_`performid'" p2sim_performance p2sim_lastseed p2sim_lastrng, replace + cap qui mata mata matsave "`perindicpath'/psim2`psim2_id'_performance_`performid'" p2sim_performance p2sim_lastseed p2sim_lastrng, replace if _rc != 0 { local try = 0 while _rc != 0 & `try' < 100 { - cap qui mata mata matsave "`perindicpath'/psim2_performance_`performid'" p2sim_performance p2sim_lastseed p2sim_lastrng, replace + cap qui mata mata matsave "`perindicpath'/psim2`psim2_id'_performance_`performid'" p2sim_performance p2sim_lastseed p2sim_lastrng, replace sleep 100 } } diff --git a/simulate2.ihlp b/simulate2.ihlp index 9b2a6ca..2cf4c75 100644 --- a/simulate2.ihlp +++ b/simulate2.ihlp @@ -1,7 +1,7 @@ {smcl} {hline} {hi:help simulate2}{right: v. 1.03 - 21. January 2022} -{hi:help psimulate2}{right: v. 1.04 - 24. January 2022} +{hi:help psimulate2}{right: v. 1.08 - 15. August 2023} {hline} {title:Title} @@ -44,6 +44,9 @@ {synopt:{opt seeds:ave(options)}}saves the used seeds, see {help simulate2##SeedSaving: saving seeds}{p_end} {synopt:{opt seedstream(integer)}}starting seedstream, only {cmd:psimulate2}{p_end} {synopt:{opt nocl:s}}do not refresh window (only {cmd:psimulate2}){p_end} +{synopt:{opt onlydots}}display dots rather than output window. Recommended for server use.{p_end} +{synopt:{opt docmd(string)}}Alterantive command to call do files.{p_end} +{synopt:{opt globalid(string)}}Sets id for simulation run. Necessary if multiple instances of {cmd:psimulate2} are run on the same machine.{p_end} {synoptline} {p2colreset}{...} @@ -292,14 +295,12 @@ It starts with the 10th observation in frame {it:seedframe} for the first draw of the program called by {cmd:simulate2}. It then continues with observations 11 for draw number 2. - {pmore} {cmd:seed(}{it:_current}{cmd:)} allows the usage of {cmd:psimulate2} in loops. It uses the current seed options as a starting seed for {cmd:psimulate2}. This allows {cmd:psimulate2} to be nested within loops. See {help simulate2##psimLoop: psimulate2 in loops}. - {phang} {cmd:seedstream(}{it:integer}{cmd:)} is a convience option for {cmd:psimulate2}. It sets the inital seedstream number for the first instance. @@ -353,6 +354,10 @@ it is not possible to start a .bat file from this folder. In this case {cmd:temppath()} is required. {cmd:psimulate2} cleans up the temp folder before using it. All files starting with {it:psim2_} are removed. +If more than one instance of {cmd:psimulate2} is run in parallel and the same path to save temporary files +is used, then option {cmd:globalid()} is required +to avoid files being overwritten. +See {help psimulate2##ExampleUnix:Examples}. {phang} {cmd:parallel(#2, processors(integer))} sets the maximum number of processors @@ -363,6 +368,23 @@ then the remaining two cores can be used for each instance. The default is 1, meaning that {cmd:psimulate} only one processor is available to each Stata instance. +{phang} +{ul:Server specific options}. +{cmd:psimualte2} can be used on Unix servers but some further options might be required. + +{phang} +{cmd:docmd(string)} specifies an alternative command to run do files. +For example on a Ubuntu system, {cmd:docmd(stata)} is necessary to start a do file. + +{phang} +{cmd:onlydots} instead of the progress window dots are displayed. +The option is intended to minimize the size of log files. + +{phang} +Multiple instances of {cmd:psimulate2} can be run on the same machine. +If the same path to save temporary files is used, files may be overwritten. +{cmd:globalid(integer)} specifies the number of the parallel instance to avoid files being overwritten. + {marker SavedValuse}{title:Saved Values} {pstd} @@ -377,25 +399,27 @@ each Stata instance. {marker examples}{title:Examples} {pstd} Make a dataset containing the OLS coefficient, standard error, the current time -and save the seeds in a frame called {it:seed_frame}. Perform the experiment 1000 times: +and save the seeds in a frame called {it:seed_frame}. +Perform the experiment 1000 times: {cmd:program define testsimul, rclass} {cmd:version {ccl stata_version}} {cmd:syntax anything} {cmd:clear} {cmd:set obs `anything'} - {cmd:gen x = rnormal(1,4)} - {cmd:gen y = 2 + 3*x + rnormal()} + {cmd:gen x = rnormal(1,4)}} + {cmd:gen e = normal()} + {cmd:gen y = 2 + 3*x + e} {cmd:reg y x} {cmd:matrix b = e(b)} {cmd:matrix se = e(V)} {cmd:ereturn clear} {cmd:return scalar b = b[1,1]} {cmd:return scalar V = se[1,1]} - {cmd:return local time "`r(current_time)'"} + {cmd:return local time "`c(current_time)'"} {cmd:end} {phang} - {cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(1000) saveseed(seed_frame,frame): testsimul 100} + {cmd:. simulate2 time = r(time) b = r(b) V = r(V), reps(1000) seedsave(seed_frame,frame): testsimul 100} {pstd} Now we can pick up the seeds and re-do the experiment for the first 500 repetitions: @@ -469,8 +493,95 @@ current seed of the parent instance as an initial seed for all child instances. Each child instance will still have a different seed stream to ensure the random number draws are different.{p_end} +{marker ExampleUnix} +{pstd} +{ul:Example for Unix Servers - efficient use of {cmd:psimulate2}} + +{pstd} +Let's assume we want to run the example above, but we want to run the simulation with 20, 50 100 and 1000 observations. +We also want to compare results if errors are standard normal and uniform distributed. +Option {cmd:uniform} is added to the {cmd:testsimul} program: + + {cmd:program define testsimul, rclass} + {cmd:version {ccl stata_version}} + {cmd:syntax anything, [uniform]} + {cmd:clear} + {cmd:set obs `anything'} + {cmd:gen x = rnormal(1,4)} + {cmd:if "`uniform'"== "" gen e = normal()} + {cmd:else gen e = runiform(-1,1)} + {cmd:gen y = 2 + 3*x + e} + {cmd:reg y x} + {cmd:matrix b = e(b)} + {cmd:matrix se = e(V)} + {cmd:ereturn clear} + {cmd:return scalar b = b[1,1]} + {cmd:return scalar V = se[1,1]} + {cmd:return local time "`c(current_time)'"} + {cmd:end} + +{pstd} +We can use a - say Ubuntu - Unix server with a total of 20 cores and we want to use all of them. +Stata is installed on the machine and can be started from the command line with the command {it:stata}. +A folder to store temporary files is created in the home directory and called {it:tmp}. + +{pstd} +To run the simulations, we write two do files, one for the DGP with standard normal errors, one for the DGP with uniform errors. +The do files are called {it:spec1.do} and {it:spec2.do}. +Both do files include a loop over the set of number of observations and save the simulated results. +The simulation is repeated 1000 times with 9 parallel instances: + + {ul:spec1.do} + {cmd:clear} + {cmd:set seed 12345} + {cmd:foreach N in 20 50 100 1000 {c -(} } + {cmd: psimulate2 , reps(1000) parallel(9, temppath("/home/tmp")) seed(_current) onlydots globalid(1) docmd(stata): testsimul `N' } + {cmd: save res_`N'_spec1} + {cmd: {c )-} } + +{pstd} +To run the second specification, we add the option {cmd:uniform} and alter the filename of the dataset with the Monte Carlo results: + + {ul:spec2.do} + {cmd:clear} + {cmd:set seed 678910} + {cmd:foreach N in 20 50 100 1000 {c -(} } + {cmd: psimulate2 , reps(1000) parallel(9, temppath("/home/tmp")) seed(_current) onlydots globalid(2) docmd(stata): testsimul `N' , uniform} + {cmd: save res_`N'_spec2} + {cmd: {c )-} } + +{pstd} +The {it:spec1.do} and {it:spec2.do} do files can run on the server at the same time and we make optimal use of the 20 cores. +Depending on the server, it might be necessary to write a batch file which allocates memory, number of cores and run time for each do file. +After the simulations are run, there should be a total of 8 files, 4 for each specification. + +{pstd} +Next we discuss the options in detail: + +{phang} +{cmd:parallel(9, temppath("/home/tmp"))} sets the number of parallel instances to 9. +Each do file then uses in total 10 cores. +The option {cmd:temppath()} specifies the path to save temporary files which needs to be set to be read and writeable. + +{phang} +{cmd:seed(_current)} ensures that the seed is altered for the different values, see {help simulate2##psimLoop: psimulate2 in loops}. +Otherwise the first 20 observations if N=50 would be the same as the observations of the case N=20. + +{phang} +{cmd:onlydots} helps to reduce the size of the log files. Instead of an overview of the progress, only dots are displayed. + +{phang} +{cmd:globalid()} sets the id for each of the instances of {cmd:psimualte2}. +All temporary files are named {it:psim2#Instance_} and thus it is ensured that no temporary file is overwritten. +Alternatively we could specify a temporary path (eg: "/home/tmp/tmp_1") for each do file/specification. + +{phang} +{cmd:docmd(stata)} specifies the command a do file is started with from the command line/within Stata. +On our Ubuntu server, new do files are started with {it:stata "home/dofiles/dofile.do"} and we need to specify the command to do so. + + {marker knownproblems}{title:Known Problems and Issues} -{p 8 8} - On some Windows installations the temporary folder is locked. In this +{p 8 8} - On some Windows installations or servers the default temporary folder is locked or not accessible. In this case Stata and psimulate2 cannot write any files in the temporary folder. Option {cmd:temppath()} can be used to set an alternative temporary folder.{p_end} {p 8 8} - psimulate2 can crash if the temporary path or any other path it writes on @@ -479,7 +590,8 @@ Google. A fix is to pause those services.{p_end} {p 8 8} - psimulate2 has problems with long names, such as variable names or command names. In such cases it tends to shorten the names which might cause interruptions in the code. The best solution is to shorten the names.{p_end} -{p 8 8} - Some servers require the {cmd:nocls} function to run {cmd:psimulate2}.{p_end} +{p 8 8} - Unix servers require the {cmd:nocls} function to run {cmd:psimulate2}.{p_end} +{p 8 8} - The {cmd:onlydots} options should be used with servers tpo avoid large log files.{p_end} {marker install}{title:How to install} @@ -501,12 +613,16 @@ Kit Baum initiated the integration of MacOS and Unix and assisted in the impleme Michael Porst and Gabriel Chodorow-Reich provided much valued feedback. I am grateful for all of their help. -All remaining errors are my own.{p_end} +All remaining errors are my own. +I do not take over responsibility for any computer crashes, lost work or financial losses following the use of {cmd:(p)simulate2}.{p_end} + {title:Change Log} +{p 4}Version 1.07 to 1.08{p_end} +{p 8 8}- added options {cmd:onlydots}, {cmd:docmd()} and {cmd:globalid} to improve support for servers{p_end} {p 4}Version 1.06 to 1.07{p_end} -{p 8 8}- fix in program lines with more than 250 characters (thanks to Gabriel Chodorow-Reich) +{p 8 8}- fix in program lines with more than 250 characters (thanks to Gabriel Chodorow-Reich){p_end} {p 4}Version 1.05 to 1.06{p_end} {p 8 8}- various small bug fixes (thanks to Gabriel Chodorow-Reich){p_end} {p 4}Version 1.04 to 1.05{p_end} diff --git a/simulate2.pkg b/simulate2.pkg index c9b1577..eb65e20 100644 --- a/simulate2.pkg +++ b/simulate2.pkg @@ -1,7 +1,7 @@ v 3 -d simulate2 module enhancing and parallelising Stata's simulate functions. Version 1.03 and 1.06 +d simulate2 module enhancing and parallelising Stata's simulate functions. Version 1.03 and 1.08 d Jan Ditzen, Free University of Bozen-Bolzano -d Distribution-Date: 20220121 +d Distribution-Date: 20230815 f simulate2.ado f psimulate2.ado f simulate2.sthlp