The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.

Photo Organizing / removing duplicates

Gilbert0Gilbert0 North of SeattleRegistered User regular
Finally getting around to backing up and consolidating a bunch of photos from a variety of sources (phones, old digital cameras, 2 laptops, desktop, etc). For all the new photos being taken (on the phones) everything is being sync'd to our Google Photos accounts but I have a large library I've consolidated down onto my desktop computer. I'm backing it up as well "to the cloud" but I KNOW I'm saving some files two or three times.

What's the best app to keep those photos? Looking for something that will take sources from several different folders and be able to remove duplicates. Tagging/moving around would be helpful as well. What do you use?

Posts

  • OrthancOrthanc Death Lite, Only 1 Calorie Off the end of the internet, just turn left.Registered User, ClubPA regular
    How comfortable are you with command line scripting? I had a similar problem with duplicates and just wrote a script to find identical files and move them to a folder for review. This isn't exactly user friendly since it was just for my own use but might help if you're comfortable. This is a linux script but might work on mac or WSL.

    I'm sure there probably are applications and services out there that do this better in a user friendly way, but thought I'd post this just incase since you have no responses. The script just builds a list of sha256 of all the files and uses that to find duplicates. It's pretty rough so this is really only an option if you're comfortable changing and fixing for your particular circumstance.

    Basic usage is
    $ ./script.sh [mode] [base_dir] [search_dir]
    

    this will create a to_remove folder in your current directory that contains the sha list and any duplict files it finds.

    the modes are:
    • index - create a sha for each of the files, but don't actually move any of them (usful if you have a dir where you know you want to keep all the files but want to remove duplicates from anoter dir.
    • filter - remove duplicate files from a directory but don't add non-duplicates to the index
    • search - find and report duplicates, don't index any files, don't remove any files
    • both - index all files in a directory and remove all duplicates.

    The search dir is the directory you want to index / filter.

    The base dir is the directory you want path's reported relative to.

    Script spoilered for lenth:
    #! /bin/bash
    
    set -e
    
    case $1 in
    index)
      index=true
      filter=false
      ;;
    filter)
      index=false
      filter=true
      ;;
    search)
      index=false
      filter=false
      ;;
    both)
      index=true
      filter=true
      ;;
    *)
      echo "Command must be index, filter, search or both"
      exit 1
      ;;
    esac
    
    BASE_DIR=$2
    SEARCH_DIR=$3
    FIND_BASE=$(realpath --relative-to "$BASE_DIR" "$SEARCH_DIR")
    
    DEST_BASE_DIR="$(pwd)/to_remove";
    DUPLICATES_FILE="$DEST_BASE_DIR/duplicate_files"
    UNIQUES_FILE="$DEST_BASE_DIR/uniques_files"
    LIBRARY_DIR="$DEST_BASE_DIR/sha-library"
    
    function write_sha {
      local sha_line=$1
      local sha_file="$DEST_BASE_DIR/sha-library.txt"
      mkdir -p "$(dirname "$sha_file")"
      echo $sha_line >> "$sha_file"
    }
    
    function lookup_sha {
      local sha=$1
      local sha_file="$DEST_BASE_DIR/sha-library.txt"
      if [ -e "$sha_file" ]; then
        grep "^$sha " "$sha_file" || true
      else
        echo ''
      fi
    }
    
    if [ ! -d "$DEST_BASE_DIR" ]; then
      mkdir "$DEST_BASE_DIR"
    fi
    
    cd "$BASE_DIR"
    
    duplicate_count=0
    unique_count=0
    
    while IFS= read -r -d '' file; do
      sha_line=$(sha256sum "$file")
      sha=$(echo "$sha_line" | sed -e 's/ .*$//')
      duplicate=$(lookup_sha "$sha")
    
      if [[ -n "$duplicate" ]]; then
        duplicate_file=$(echo "$duplicate" | sed -e 's/^[^ ]* *//')
        if [ "$file" != "$duplicate_file" ]; then
          echo "$file is duplicate of $duplicate_file"
          duplicate_count=$((duplicate_count+1))
    
          if [ $filter = true ]; then
            echo "$file -> $duplicate_file" >> $DUPLICATES_FILE
            parent_path=$(dirname "$file")
            dest_path="$DEST_BASE_DIR/$parent_path"
            mkdir -p "$dest_path"
            mv "$file" "$dest_path"
      
            while [ "$parent_path" != '.' -a ! "$(ls -A "$parent_path")" ]; do
              echo "Removing empty dir $parent_path"
              echo "$parent_path --" >> $DUPLICATES_FILE
              rmdir "$parent_path"
              parent_path=$(dirname "$parent_path")
            done
          fi
        else
          unique_count=$((unique_count+1))
          if [ $index = false ]; then
            echo "Unique File $file"
            echo "$file" >> $UNIQUES_FILE
          fi
        fi
      else
        unique_count=$((unique_count+1))
        if [ $index = true ]; then
          write_sha "$sha_line"
        else
          echo "Unique File $file"
          echo "$file" >> $UNIQUES_FILE
        fi
      fi
    done < <(find "$FIND_BASE" -type f -print0 | sort -z)
    
    echo "Unique: $unique_count"
    echo "Duplicate: $duplicate_count"
    

    orthanc
  • JusticeJustice Registered User regular
    @Gilbert0 please update if you find something. I've looked (admittedly, not exhaustively) into the same problem without finding anything beyond simple file-to-file matches, and I haven't found anything that combines organization and duplicate-elimination.

  • firewaterwordfirewaterword Satchitananda Pais Vasco to San FranciscoRegistered User regular
    Assuming you have access to it, I'm pretty sure lightroom has a function to do this, or if not, there's a plug-in that does. If you load all your stuff into a new catalogue, it should actually recognize the dupes automatically now that I think about it.

    I think you can get a month access to lightroom for like $10.

    Lokah Samastah Sukhino Bhavantu
  • WiseManTobesWiseManTobes Registered User regular
    This post makes me feel so old, the last time I re-organized photos, it was just hours with boxes and albums and physical pictures rofl.

    Steam! Battlenet:Wisemantobes#1508
  • Gilbert0Gilbert0 North of SeattleRegistered User regular
    I'm actually a little shocked that no one has recommended......anything?! Does no one tag / use a program for their photos?

    I haven't done more looking but if I ever get around to it, I'll update the thread.

  • firewaterwordfirewaterword Satchitananda Pais Vasco to San FranciscoRegistered User regular
    Gilbert0 wrote: »
    I'm actually a little shocked that no one has recommended......anything?! Does no one tag / use a program for their photos?

    I haven't done more looking but if I ever get around to it, I'll update the thread.

    Again I'd recommend lightroom, but you might want to cross post this in the AC photo thread - there are some very talented pros in there that might be able to help.

    Lokah Samastah Sukhino Bhavantu
  • BlindZenDriverBlindZenDriver Registered User regular
    Gilbert0 wrote: »
    I'm actually a little shocked that no one has recommended......anything?! Does no one tag / use a program for their photos?

    I haven't done more looking but if I ever get around to it, I'll update the thread.

    I can't really give special advice on a tool to organize photos as I just do that in folders I create in my computers filesystem and name the files for their content. However as for finding duplicates there are many nice tools available that can handle files in general, like for instance I am using Duplicate Cleaner Free which is handy and as the name suggest also free. There is pro version, but I would think that is for rather advanced needs as I seem to do fine with the free version.
    Here is an URL: https://www.duplicatecleaner.com/

    It lets you compare files from multiple locations, there are good filter functions and the results are organized in a way that makes it easy to chose what to get rid of. And it works for all types of files.


    Bones heal, glory is forever.
  • AridholAridhol Daddliest Catch Registered User regular
    edited May 2018
    I purchased a photo de-duper program as I had something like 100K photos shoved into various directories and at most 20-25K were unique ones we'd keep.
    I think I even made or posted in a thread here about it so I'll try to find it and the program name a bit later.

    For organizing I used Picasa but I think that's not an option anymore. My recommendation is to grab 500 photos at a time and organize that block each night. Start with how you might want to review them. Would you look them up by event? By year? People? Etc.

    Ultimately I went by a combo of year's and events (e.g. 2012, vacations).

    I suggest using whatever built in preview and file management software your OS offers instead of a program.

    Edit: I used duplicate file finder by ashisoft.

    Aridhol on
Sign In or Register to comment.