0. 起因
有一堆图片,比如20000张甚至更多,需要我批量处理,处理单张图片的函数processImg(image_path)
已经写好,如何效率完成任务?
1. 出发
既然要效率完成任务,不用想当然是多线程,借助boost
库线程池:
// this piece of code is from Internet
#include <boost/asio/io_service.hpp>
#include <boost/bind.hpp>
#include <boost/thread/thread.hpp>
namespace multithread
{
typedef std::unique_ptr<boost::asio::io_service::work> asio_worker;
struct ThreadPool {
ThreadPool(size_t threads) :service(), working(new asio_worker::element_type(service)) {
while(threads--)
{
auto worker = boost::bind(&boost::asio::io_service::run, &(this->service));
g.add_thread(new boost::thread(worker));
}
}
template<class F>
void enqueue(F f){
service.post(f);
}
~ThreadPool() {
working.reset(); //allow run() to exit
g.join_all();
service.stop();
}
private:
boost::asio::io_service service; //< the io_service we are wrapping
asio_worker working;
boost::thread_group g; //< need to keep track of threads so we can join them
};
}
然后遍历所有图片路径,将每一个处理图片的任务塞进线程池:
// toy code
size_t n_threads = 24;
multithread::ThreadPool tp(n_threads);
for(auto image_path: image_paths)
{
tp.enqueue(boost::bind(processImg, image_path));
}
然而事情并没有这么简单,实际效率并不高,甚至还不如单线程处理,CPU完全没有被利用起来。
为什么?
2. 加速
后来我想到:处理一张图片实际上不需要花费多少时间,很快就被线程执行完,然而调配任务给线程同样花费时间,按照上文的做法,时间实际上都被浪费在调配线程上(调配20000次),所以效率低。
更效率的做法当然是将2000张图片分成n_threads
组,每个线程处理一组图片,这样程序就只需要调配线程n_threads
次,可以将CPU充分利用起来。
//handle a group of images
void processImgs(image_paths)
{
for(auto image_path:image_paths)
processImg(image_path);
}
size_t n_threads = 24;
multithread::ThreadPool tp(n_threads);
//group image paths
vector<string> image_group = group_paths(image_paths);
for(auto paths: image_group)
tp.enqueue(boost::bind(processImgs, paths));
这样改进之后,程序果然飞起来了~~