Threads in Stata-MP

The good news is that a computer running multiple Stata programs shows total throughput proportionate to the number cores including hypercores. Stata documentation doesn't give hypercores much credit, even suggesting turning them off. While that might be right for a processor running a single Stata job, multiple competing jobs will use hypercores at essentially the same speed as full cores.

The -parallel- .ado file from SSC has recently (July 2017) added a "sethosts" feature that allows jobs to be spread over multiple separate computers. See this thread for details.

Our largest systems have 12 full cores and 12 additional hypercores and can run 3 Stata-MP/8 regressions at about 80% of the speed of a single such job, or about 20 times the throughput of a single-threaded Stata run. Of course, you need to be very careful to not run out of real memory - a Stata job running out of paged memory will not finish in finite time, and the computer itself is likely to appear down to users.

The bad news is that Amdahl's law is very much in evidence. -Generate-, -replace- and all I/O have significant single-threaded portions and will tend to dominate the execution time as the number of cores increases. Sometimes the results of those operations can be saved in a reusable "work dataset", which is helpful. On our systems there are usually more Stata-MP jobs using 100% of a single CPU than are using two or more CPUs.