The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
Please vote in the Forum Structure Poll. Polling will close at 2PM EST on January 21, 2025.
Help With WinVis: Finding and eliminating duplicate files
My hard drive is clogged. And a big part of the problem is redundant files (documents, pictures, music) in multiple copies spread across different folders, or copies of folders nested Russian Doll Style, within the same folder. Is there any utility that will let me make one big search to identify and corral all files that have an identical twin elsewhere on the drive?
I figure this is a generally useful thing, so I went ahead and wrote a primitive implementation in Python. If you have a Windows machine, you'll probably need to install Python 2.6 from http://www.python.org . If you're on Linux or a Mac, you likely already have it.
import os
import hashlib
hashdict={} #will be a dictionary in form {hexdigest : filepath}
duplicates=open("results.txt","w")
mode=raw_input("log findings to a text file, or ask about removing duplicates as I go? type either 'log' or 'ask':\n> ")
def dirsearch(thisdir):
for something in os.listdir(thisdir):
something=os.path.join(thisdir,something)
print "reading ",something
if os.path.isfile(something):
currentfile=open(something)
currentmd5sum=hashlib.md5(currentfile.read()).hexdigest()
currentfile.close()
if currentmd5sum not in hashdict:
hashdict[currentmd5sum]=something
else:
somethingelse=hashdict[currentmd5sum]
if mode=="log":
duplicates.write("match found: "+something+" hashes to same digest as "+somethingelse+"\n")
elif mode=="ask":
whichone=raw_input("\n\n\nidentical hash found:\n\nfile A: "+something+"\n\nfile B: "+somethingelse+"\n\nType 'a' or 'A' to remove A, 'b' or 'B' to remove B, or just hit enter to leave both alone.\n> ")
if whichone.lower=="a":
os.remove(something)
elif whichone.lower=="b":
os.remove(somethingelse)
elif os.path.isdir(something):
dirsearch(something)
if __name__=="__main__":
dirsearch(os.getcwd())
duplicates.close()
Once you've gotten Python, which is a pretty handy thing to have overall, just copy and paste this into a text file, name it "whateveryouwant.py", and double click to run it. It treats the folder it's located in as the top level of the search, so running it off of your desktop, for example, won't do much.
edit: a word of caution.
Be careful with this. If you tell it to delete a file, then it will do exactly that - permanently delete it and free up HDD space. It won't move it to the recycle bin or anything like that, so you don't really have a recovery option. I don't suggest running it from the root directory as there are probably identical, but important, system files. Keep it within your user folders e.g. C:\Documents and Settings\Me or C:\Users\Me
Posts
Once you've gotten Python, which is a pretty handy thing to have overall, just copy and paste this into a text file, name it "whateveryouwant.py", and double click to run it. It treats the folder it's located in as the top level of the search, so running it off of your desktop, for example, won't do much.
edit: a word of caution.
Be careful with this. If you tell it to delete a file, then it will do exactly that - permanently delete it and free up HDD space. It won't move it to the recycle bin or anything like that, so you don't really have a recovery option. I don't suggest running it from the root directory as there are probably identical, but important, system files. Keep it within your user folders e.g. C:\Documents and Settings\Me or C:\Users\Me