not in 逻辑上很容易理解,但是碰上大数据表时,不管有无可被利用的索引运行起来都会发生效率悲剧!应尽量避免使用。如果记录行达到数万、数十万行以上时,语句运行花上几个小时甚至数天才有结果都是很正常的。
如果Id字段是自增ID(建有索引),下列写法的效率将会很高:
delete from zhifu a where not exists(select 1 from (select min(id) as idmin from zhifu group by bianhao)b where b.idmin=a.id)
如果Id字段没有索引,但是没有重复值,前面的写法效率也不行。推荐下列写法:
delete from zhifu c where exists(select 1 from (select a.id
FROM zhifu AS a LEFT JOIN (select min(id) as idmin from zhifu group by bianhao)b ON b.idmin=a.id
WHERE b.idmin is null)d where d.id=c.id)
如果Id字段不是自增ID,又没有索引,而且有重复值。这样就比较麻烦了,建议分多步走:
先创建一个临时表,将有重复的记录只取一条存入临时表,然后删除表中有重复的记录,再将临时表中的数据插入会原始表,最后在删除临时表。
假设用TEMP做临时表名
1)取有重复的唯一记录存入临时表(注意各个字段的顺序不要搞乱,而且要写齐)
select first(id),first(bianhao),....into temp from zhifu group by bianhao having count(*)>1
2)删除原始表中的有重复记录
delete from zhifu a where exists(select 1 from (select bianhao from zhifu group by bianhao having count(*)>1)b where b.bianhao=a.bianhao)
3)将临时表中的记录插入回原始表
insert into zhifu select * from temp
4)删除临时表
drop table temp
当然对付小数据表用NOT IN就行了,不用搞得那么麻烦。
delete from zhifu t1
inner join (select bianhao, min(id) mid from zhifu group by bianhao) t2
on t1.bianhao=t2.bianhao
where t1.id>t2.mid
没测试过,自己试下