Commit 29685d2e authored by Jason Rhinelander's avatar Jason Rhinelander
Browse files

Get jobno and open log after sleeping

The previous code was obtaining the jobno and opening the log before
any necessary sleeping, which two undesirable effects:

1. The "Starting job n" messages looked fairly random when jobs were
delayed.
2. The initial batch of jobs were allocated not as hosts are actually
available, but when hosts are available after a delay.  For a fast job
with some hosts having a large number of CPUs, this means the job could
take unnecessarily long: e.g. imagine a 'hostA=1 hostB=99' with an
instantaneous job (and connection) executing 100 jobs: all jobnos would
be instantly grabbed by the 99 threads, but the last job on hostB
wouldn't run for 99 DELAY periods, even though the optimal solution is
50 jobs each (and a total time of 50 DELAYs instead of 99 DELAYs).

This fixes the problem by checking and obtaining a jobno after
finishing the delay.
parent f75fb2ea
......@@ -333,15 +333,7 @@ void thread_runner(std::promise<void> started, const std::pair<std::string, std:
FILE *LOGFILE = nullptr;
sigset_t sigint; sigaddset(&sigint, SIGINT);
while (not abort_jobs and (myjobno = jobno++) <= last_jobno) {
if (!LOGFILE) {
std::string log_file = log_dir + "/" + hostname + "-" + std::to_string(threadnum) + ".log";
LOGFILE = fopen(log_file.c_str(), "a");
if (!LOGFILE) {
throw std::system_error(errno, std::system_category(), "Unable to open log file '" + log_file + "'");
}
}
while (not abort_jobs and jobno <= last_jobno) {
if (first) { first = false; started.set_value(); }
std::chrono::high_resolution_clock::time_point started;
pid_t child;
......@@ -356,7 +348,7 @@ void thread_runner(std::promise<void> started, const std::pair<std::string, std:
// release the lock and sleep until the appropriate time. Do this repeatedly,
// because when we wake up, some other thread might have already started and pushed
// back the earliest start time.
while (!abort_jobs && started < until) {
while (!abort_jobs && jobno <= last_jobno && started < until) {
lock.unlock();
std::this_thread::sleep_until(until);
lock.lock();
......@@ -364,7 +356,17 @@ void thread_runner(std::promise<void> started, const std::pair<std::string, std:
until = host_next_conn[hostname];
}
if (abort_jobs) break;
if (abort_jobs || jobno > last_jobno) break;
myjobno = jobno++;
}
if (!LOGFILE) {
std::string log_file = log_dir + "/" + hostname + "-" + std::to_string(threadnum) + ".log";
LOGFILE = fopen(log_file.c_str(), "a");
if (!LOGFILE) {
throw std::system_error(errno, std::system_category(), "Unable to open log file '" + log_file + "'");
}
}
check_date_change();
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment