The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Sort 100,000 pictures

AridholAridhol Daddliest CatchRegistered User regular
edited January 2011 in Help / Advice Forum
I have a large task ahead of me and I'd appreciate any and all suggestions.

I have 12 years worth of digital photo's that passed through approximately 15 different machines (over the course of time) and I need to make a quality, decent backup. There are a HUGE amount of duplicates (named differently!) that I need to sort out and collate. I also eventually need to tag/name these correctly.


I have collected all the pictures in a single folder (with a million subfolders) which amounts to about 130GB.


tldr:
How I sort thousands of pictures for duplicates, beyond name, quickly.

Aridhol on

Posts

  • ZeonZeon Registered User regular
    edited January 2011
    There is enterprise software available that will do this (its called data deduplication) but for personal use its probably way too pricey.

    Easiest way i can think of off the top of my head is to sort by file size and manually check any files that are exactly the same size as any other files, and delete the duplicates by hand. It will take a while.

    Zeon on
    btworbanner.jpg
    Check out my band, click the banner.
  • evilmrhenryevilmrhenry Registered User regular
    edited January 2011
    In Picasa, choose Tools->Experimental->Show Duplicate Files
    ?

    evilmrhenry on
  • falsedeffalsedef Registered User regular
    edited January 2011
    Aridhol wrote: »
    I have a large task ahead of me and I'd appreciate any and all suggestions.

    I have 12 years worth of digital photo's that passed through approximately 15 different machines (over the course of time) and I need to make a quality, decent backup. There are a HUGE amount of duplicates (named differently!) that I need to sort out and collate. I also eventually need to tag/name these correctly.


    I have collected all the pictures in a single folder (with a million subfolders) which amounts to about 130GB.


    tldr:
    How I sort thousands of pictures for duplicates, beyond name, quickly.

    Try asking or hiring a script programmer to delete files by hash (if they're exact duplicates). There are python examples all over the web. Other tasks could also easily be done by an experienced programmer.

    If they're semi exact duplicates ( I.e. different compression, etc), then you'll need a heavier weight app to do it. No suggestions here for free ones, but they do exist but are usually targeted for linux and lossless formats.

    falsedef on
  • khainkhain Registered User regular
    edited January 2011
    Googling "deleting identical pictures" seems to indicate that there a ton of programs that can delete based on file size or even near identical pictures based on visual similarity. DupliFinder is just one option that came back in the results via a link from LifeHacker.

    khain on
  • splashsplash Registered User regular
    edited January 2011
    The duplicates don't have similar names but smart photo software will be able to see the real original date taken even if the file has been moved many times. The file has tons of extra information embedded in it than what explorer or simple software will see. I use Adobe Bridge (comes with Photoshop) for batch renaming photos and it can also be used to order files by date taken. If you somehow had access to this program it would work well for you manually. But if you want to do something automatic to get rid of duplicates I don't think that's in its realm.

    Maybe be careful if any images are taken from cell phone cameras though. In the Bridge for those kinds of pics for me the date created and date modified are switched for some reason.

    splash on
  • HorusHorus Los AngelesRegistered User regular
    edited January 2011
    cataloging and organizing them per whatever criteria can be done (Mac) iPhoto &/or Aperture and for Windows Lightroom/Aperture.

    From experience, do not touch the files (well deal with the dups first) but create a visual map on how your breaking them up
    Example:
    Photos > The Smith Family
    >Holidays
    >Events
    >>Sports
    >>School Functions
    >>Vacations
    >>etc
    >Creative
    >>Nature
    >>Portraits

    Basically use pencil and paper to map out how your gonna sort out the images before touching the programs cause its going to overwhelm you.

    tldr; create a battle plan before you start the fighting

    Horus on
    “You have brains in your head. You have feet in your shoes. You can steer yourself any direction you choose. You're on your own. And you know what you know. And YOU are the one who'll decide where to go...”
    ― Dr. Seuss, Oh, the Places You'll Go!
Sign In or Register to comment.