The new forums will be named Coin Return (based on the most recent vote)! You can check on the status and timeline of the transition to the new forums here.
The Guiding Principles and New Rules document is now in effect.
Basically, I have redundant contact information. The contacts and their info are listed horizontally, with different columns for addresses, phone numbers, names, etc.
What I want to do is eliminate a row when a contact has the same first and last name (these are in different columns). Is there an easy way of doing this?
Bayesian on
0
Posts
L Ron HowardThe duckMinnesotaRegistered Userregular
edited June 2011
I just ran through this like a month back.
The answer is No.
There is no 'easy' way to do it. After spending too much time with VB and whatnot, I found it was easiest to just do it by hand.
What I ended up doing was sorting it Field A and B, in whichever order, then turning on context coloring, so that non-dupes are in green, and duplicates are in red. Then I went through it pretty much line-by-line selecting the duplicates and deleting them. In the end, it took less time to do it that way than it was to fuck around with VB or Python to create an automated script to do it.
And the data I was using was in the hundreds of thousands, but overall it took like a half hour of my time to do it that way than the two days of research and failed coding experiments to get it to happen manually.
2. Add a column, with something like =IF(A2=A1, IF(B2=B1, "Dupe!", ""), ""). Copy the formula down the entire column. (EDIT: Actually, instead of "" it would be better to use something like "DELETEME", so that you can search and replace it out, making the highlight and delete a little easier in Step 5 if you're using the keyboard to move around)
3. Copy the column contents and Paste Special (value) it back in right on top of the original.
4. Sort by the new column.
5. Highlight and delete everything that has Dupe! in the new column.
Optional first step, if it's important to keep the data in order and there's no serial identifier: Add a new column and just number it 1, 2, 3... down the column. After doing everything above, use this column to sort it back into the original order.
Well, this isn't automated, but it's a fairly simple way to get all the duplicates in one place so you can mass delete rows.
Sort by Last Name, then First name.
In a new column, use a formula like this (if Last Name is in A and First Name is in B, this formula is used for all rows after the 1st row):
if (and(compare(lower(a2,a1)),compare(lower(b2,b1))),1,0)
This will place a 1 in every row that has the same first and last name as the row above it. Copy this column and Paste as Value into the next column. Then you can sort based on this last column so that all the duplicate rows (marked as 1) will be together and you can delete them all at once.
edit: damn, beat. That's what I get for not refreshing before responding.
Data -> Filter -> Autofilter
Select the range you want to look at.
Select unique records only.
Copy 'em someplace, whatever you like.
That's in OS X Excel 2007.
That only selects duplicates,and won't work as you expect it will.
It does one of two things:
It will give you a list of all the duplicates, without extras, but won't select the the entire thing,
OR it will select everything BUT the duplicates.
Version depending.
This was something I found out when I was messing with it. It doesn't work sufficiently to keep everything and just remove the extra duplicates for the records.
I think it also only selects the columns you selected, not all entries in each column. So if you tell it to figure out duplicates based on Column A and B, it will only give you the list of A and B, and ignore the rest. So if you have anything in C, D, E, etc, they won't show up.
It's a nice feature, but highly defective.
or create a new master column with the last name and first name, then on a new sheet copy just the master column info, remove duplicates, and vlookup the rest of the info onto the new sheet.
Posts
The answer is No.
There is no 'easy' way to do it. After spending too much time with VB and whatnot, I found it was easiest to just do it by hand.
What I ended up doing was sorting it Field A and B, in whichever order, then turning on context coloring, so that non-dupes are in green, and duplicates are in red. Then I went through it pretty much line-by-line selecting the duplicates and deleting them. In the end, it took less time to do it that way than it was to fuck around with VB or Python to create an automated script to do it.
And the data I was using was in the hundreds of thousands, but overall it took like a half hour of my time to do it that way than the two days of research and failed coding experiments to get it to happen manually.
2. Add a column, with something like =IF(A2=A1, IF(B2=B1, "Dupe!", ""), ""). Copy the formula down the entire column. (EDIT: Actually, instead of "" it would be better to use something like "DELETEME", so that you can search and replace it out, making the highlight and delete a little easier in Step 5 if you're using the keyboard to move around)
3. Copy the column contents and Paste Special (value) it back in right on top of the original.
4. Sort by the new column.
5. Highlight and delete everything that has Dupe! in the new column.
Optional first step, if it's important to keep the data in order and there's no serial identifier: Add a new column and just number it 1, 2, 3... down the column. After doing everything above, use this column to sort it back into the original order.
Sort by Last Name, then First name.
In a new column, use a formula like this (if Last Name is in A and First Name is in B, this formula is used for all rows after the 1st row):
if (and(compare(lower(a2,a1)),compare(lower(b2,b1))),1,0)
This will place a 1 in every row that has the same first and last name as the row above it. Copy this column and Paste as Value into the next column. Then you can sort based on this last column so that all the duplicate rows (marked as 1) will be together and you can delete them all at once.
edit: damn, beat. That's what I get for not refreshing before responding.
Select the range you want to look at.
Select unique records only.
Copy 'em someplace, whatever you like.
That's in OS X Excel 2007.
That only selects duplicates,and won't work as you expect it will.
It does one of two things:
It will give you a list of all the duplicates, without extras, but won't select the the entire thing,
OR it will select everything BUT the duplicates.
Version depending.
This was something I found out when I was messing with it. It doesn't work sufficiently to keep everything and just remove the extra duplicates for the records.
I think it also only selects the columns you selected, not all entries in each column. So if you tell it to figure out duplicates based on Column A and B, it will only give you the list of A and B, and ignore the rest. So if you have anything in C, D, E, etc, they won't show up.
It's a nice feature, but highly defective.
Blizzard: Pailryder#1101
GoG: https://www.gog.com/u/pailryder