perl 或者 python 多线程的问题？求实现！

2025-05-20 07:15:40

推荐回答（2个）

回答1：

def read_in_chunks(file_ob, chunk_size=1024*1024):
    while True:
        data = file_ob.read(chunk_size)
        if not data:
            break
        return data

f = open('large.dat')
count={'a':0,'b':0,'c':0,'d':0,'e':0,'f':0,'g':0,'h':0,'i':0,'j':0,'k':0,'l':0,'m':0,'n':0,'o':0,'p':0,'q':0,'r':0,'s':0,'t':0,'u':0,'v':0,'w':0,'x':0,'y':0,'z':0}
for each in read_in_chunks(f):
    for eeach in each:
        if eeach<='z' and eeach >='a':
            count[eeach] = count[eeach]+1
print count

你的瓶颈不在于计算，所以多线程帮不了你的忙，上面的程序在统计1GB 文件在我的机器上1秒内完成，下面是截图，其中large.dat是随机生成的1GB二进制文件（20GB随机文件生成的太慢了）内存消耗也不大，每次只读入1MB 数据。相信20GB文件也能较快的完成统计。

回答2：

嗯。多线程可以提高一点儿效率。主要还是看系统的瓶颈是什么。比如计算hash比较慢。那么多线程有用。如果慢的是磁盘，你可以换一个PCI的SSD固态硬件。

20GB不是很大啊。好吧，不管怎么说，给你一个方案。

假设我用10个线程（或者是用进程更好）来处理这个文件。那么就文件大小
filesize=os.path.getsize("filename_20gb")
除以10
step=filesize/10

得到一个数组，xrange(0,filesize,step)
然后把文件的10个启始位置传递给10个线程或者是进程。
在进程或者是线程里这样使用
startpos=xxxx
endpos=startpos+step
fp=open("filename_20gb","rb")
fp.seek(startpos,0)

while True:
if fp.tell()>=endpos:break
line=fp.readline()
if not line:break
#你的处理程序，hash算法在这里加入

这样就可以了。