2

Currently I have a macro that runs through a list and deletes duplicate values (in one column), but it's proving to be very inefficient. For every entry that it checks for duplicates, it has to run through the whole column; my file currently has 50,000 entries and that is no small task.

I think an easier way for the macro to work is for the macro to check if this value is in an array. If it is, then remove the row that the entry is in. If it isn't, add the value to the array.

Can someone provide some help with the basic outline of the macro? Thanks

5
  • Copy the values from the array to a new sheet. Then insert a blank column next to the 50k entries and do a vlookup. Once done, do an Autofilter and then delete the duplicate entries in 1 go. Commented Jul 12, 2012 at 17:11
  • One more option : Loop through the array and do an autofilter on the 50k and simply delete it one by one. A slower process than the above... Commented Jul 12, 2012 at 17:13
  • By "Delete" do you mean erasing the cell contents, leaving a blank cell, or you mean removing the value and moving all of the other values up one cell? It makes a big difference in both the complexity and speed of the answer. (Erasing is much simpler/faster). Commented Jul 12, 2012 at 18:18
  • I should have made that more clear sorry. Yes, the macro is supposed delete the whole row. Commented Jul 12, 2012 at 18:34
  • @user1521458: Did you even see my reply? Commented Jul 12, 2012 at 18:35

4 Answers 4

3

The Below code will loop through your source data and store it in an array, while simultaneously checking for duplicates. After the collection is complete it uses the array as a key to know which columns to delete.

Due to the high number of potentiol screen updates with the deletion be sure to turn screenupdating off. (included)

Sub Example()
    Application.ScreenUpdating = false
    Dim i As Long
    Dim k As Long
    Dim StorageArray() As String
    Dim iLastRow As Long
    iLastRow = ActiveSheet.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    ReDim StorageArray(1 To iLastRow, 0 To 1)

    'loop through column from row 1 to the last row
    For i = 1 To iLastRow
        'add each sheet value to the first column of the array
        StorageArray(i, 0) = ActiveSheet.Range("A" & i).Value
        '- keep the second column as 0 by default
        StorageArray(i, 1) = 0
        '- as each item is added, loop through previously added items to see if its a duplicate
        For k = 1 To i-1
            If StorageArray(k, 0) = StorageArray(i, 0) Then
                'if it is a duplicate set the second column of the srray to 1
                StorageArray(i, 1) = 1
                Exit For
            End If
        Next k
    Next i

    'loop through sheet backwords and delete rows that were maked for deletion
    For i = iLastRow To 1 Step -1
        If StorageArray(i, 1) = 1 Then
            ActiveSheet.Range("A" & i).EntireRow.Delete
        End If
    Next i

    Application.ScreenUpdating = true
End Sub

As requested, here is a similar way to do it, using Collections instead of an Array for key indexing: (RBarryYoung)

Public Sub RemovecolumnDuplicates()
    Dim prev as Boolean
    prev = Application.ScreenUpdating
    Application.ScreenUpdating = false
    Dim i As Long, k As Long

    Dim v as Variant, sv as String
    Dim cl as Range, ws As Worksheet
    Set ws = ActiveWorksheet    'NOTE: This really should be a parameter ...

    Dim StorageArray As New Collection
    Dim iLastRow As Long
    iLastRow = ws.Cells(ActiveSheet.Rows.Count, "A").End(xlUp).Row

    'loop through column from row 1 to the last row
    i = 1
    For k = 1 To iLastRow
        'add each sheet value to the collection
        Set cl = ws.Cells(i, 1)
        v = cl.Value
        sv = Cstr(v)

        On Error Resume Next
            StorageArray.Add v, sv
        If Err.Number <> 0 Then
            'must be a duplicate, remove it
            cl.EntireRow.Delete
            'Note: our index doesn't change here, since all of the rows moved
        Else
            'not a duplicate, so go to the next row
            i = i + 1
        End If
    Next k

    Application.ScreenUpdating = prev
End Sub

Note that this method does not need to assume any datatype or integer limits for the values of the cells in the column.

(Mea Culpa: I had to hand-enter this in Notepad, because my Excel is busy running project tests right now. So there may be some spelling/syntax errors...)

Sign up to request clarification or add additional context in comments.

17 Comments

This is pretty close to the correct/best answer. You shouldn't be assuming that the range is also the active sheet, you should save then restore the current screen updating setting, and most importantly, you should be using either hashing or indexing (collections) to check for pre-existence, not re-scanning the whole StorageArray. I could make these changes for you if you wish.
Also, I just noticed, you're deleting the whole row, but the OP specified that only one column was being modified.
Thanks for the help; I think this one is the closest to what'll work but instead of running through every value in the sheet, how can I get it to only run through one column?
Oops, the OP updated to say that the whole row should be Deleted.
@Gimp: Looping + Deleting 5000 rows(again in a loop) in 1 second seems too unlikely to me. But I will leave it at that :)
|
1

This is a followup to my comment. Looping 50k records + Looping the Array will be an over kill for such a simple operation.

Like I mentioned in my comment, copy the values from the array to a new sheet. Then insert a blank column next to the 50k entries and do a Vlookup or CountIf. Once done, do an Autofilter and then delete the duplicate entries in 1 go. Let's take an example and see how this will work.

Let's say we have have an array with 1000 items? and in 1 sheet we have 50k data. The below code will be tested with 1000 items in Array and 50k Data See Snapshot

enter image description here

Paste this code in a module (The code took less then 5 secs to finish)

enter image description here

Sub Sample()
    Dim ws As Worksheet, wstemp As Worksheet
    Dim LRow As Long
    Dim Ar(1 To 1000) As Long
    Dim startTime As String, EndTime As String

    startTime = Format(Now, "hh:mm:ss")

    Set ws = Sheets("Sheet1")
    Set wstemp = Sheets.Add

    '~~> Creating a dummy array
    For i = 1 To 1000
        Ar(i) = i
    Next i

    '~~> Copy it to the new sheet
    wstemp.Range("A1:A1000").Value = Application.Transpose(Ar)

    With ws
        LRow = .Range("A" & .Rows.Count).End(xlUp).Row

        .Columns(2).Insert Shift:=xlToRight
        .Range("B1").Value = "For Deletion"
        .Range("B2:B" & LRow).FormulaR1C1 = "=COUNTIF(" & wstemp.Name & "!C[-1],RC[-1])"
        .Columns(2).Value = .Columns(2).Value

        '~~> Remove any filters
        .AutoFilterMode = False

        '~~> Filter, offset(to exclude headers) and delete visible rows
        With .Range("B1:B" & LRow)
            .AutoFilter Field:=1, Criteria1:="<>0"
            .Offset(1, 0).SpecialCells(xlCellTypeVisible).EntireRow.Delete
        End With

        '~~> Remove any filters
        .AutoFilterMode = False

        .Columns(2).Delete
    End With

    EndTime = Format(Now, "hh:mm:ss")

    MsgBox "The process started at " & startTime & " and finished at" & EndTime
End Sub

9 Comments

This seems both slow and overly complex.
5 secs is slow? And which part did you find complex? :)
Yes sorry I'm trying it out now. Thanks for your help!
i.imgur.com/jmB49.png that comes up when i try to run this. any idea why?
In which columns are you inserting a blank row?
|
1

For Excel 2007 and later: Copy the array to a sheet and use the removeduplicates method:

set ws = worksheets.add
ws.[A1].resize(ubound(yourarray,1),ubound(yourarray,2)).value = yourarray
ws.usedrange.removeduplicates columns:=1, header:=no

This assumes the lower bound of your array is 1, that the column you want to de-duplicate is column 1 and that your list has no headers. You can then find the borders of the new range and read it back into your array (erase the current array first).

Comments

0

I would suggest filltering your column and then use a formula to find the duplicates and delete them. I don't have the actually code for you (you didn't give us any code)

dim a as range
dim b as range
set a = Range ("A1")

Do while Not isEmpty(A)
Set b = a.offset(1,0)

If b = a then
b= ""
else a.offset (1,0)

Loop

I am sure you could put the filter in the code or just rember to fillter before you run the macro.

2 Comments

This seems to be only checking the ranges values against the prior value in the list, whereas duplicates can occur between any two entries in a list. Also the statement "else a.offset(1,0)" looks like a bug to me.
I was trying to stay simple and assume that the duplicates would fall in order when filtered. I am now pretty sure that my IF statment will not funtion correctly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.