creativity-data: multi-threaded data collection
creativity-data is fairly slow right now: generating the .csv for a random sample of 27 .crstate files took about 1m35s (extrapolating, that means 800 takes somewhere around 45 minutes).
I think, however, that this is CPU-bound, not storage bound: there ought to be substantial benefits of making the data collection multithreaded, in two ways:
running data collection in a thread. Right now, a file is read, then all needed states (typically the 25 before piracy, 25 before public sharing, and 25 final states) are read in, then values are calculated and output. Obviously a whole block of 25 needs to be read before that block can be converted to statistics, but there's no reason that we have to read all three blocks before starting on the first block; reading of the second and third block can be done by a thread while the first block values are being calculated.
file processing in parallel. Everything is sequential right now; it might be worth exploring the benefits of having multiple threads processing files in parallel.