9

I can't seem to figure out what magic is happening behind the PHP scene and why array_unique cannot detect my duplicates.

In my specific situation, I have 2 collections of users, which I am merging into one and then keeping only unique entries. For that I am converting both collections into arrays, array_merge() them and then based on parameter apply array_unique(..., SORT_REGULAR) so that they are compared as objects without any conversions. I realise that comparing objects is a slippery slope, but in this case it's weirder than I though.

After merge but before the uniqueness check I have this state: enter image description here

As you can see, items 4 and 11 are the same User entity (both non-strict and strict comparison agree on that). Yet after array_unique() they both remain in the list for some reason: enter image description here

As you can see, items 7-10 were detected and removed, but 11 wasn't.

How is that possible? What am I not seeing here?

Currently running PHP 7.4.5

Code is from project using Symfony 4.4.7 and Doctrine ORM 2.7.2 (although I think this should be irrelevant, if the objects are equal both by == and === comparisons).

Fun fact for bonus points - applying array_unique twice in a row gives actually unique results: enter image description here

Mind = blown

UPDATE: I have added throw new \RuntimeException() in my User::__toString() method, to be extra sure noone is doing conversion to string.

Please do not suggest converting to string - that is neither a solution to my problem, nor what this question is about.

27
  • 1
    @u_mulder - can you elaborate on "still array_unique is not supposed for this"? Why not? Commented May 20, 2020 at 10:51
  • 1
    array_unique compares strings. So make a check comparing objects string representation (string) $a === (string) $b Commented May 20, 2020 at 10:51
  • 2
    Looks like the code creates the return value array (converting to strings where needed - see the upper conversion) and removes elements by index using a second array for comparison purposes (arTmp in the code). This second array uses pointers to the variables (see cmpdata->b.val where b is a pointer and so b.val is not the string representation) to find what to remove. This works as everything is removed by index. As for the second time you call the function, it works because this time you ARE passing in strings as this is what the first function returned. Commented May 27, 2020 at 7:31
  • 2
    There's actually a pretty explicit warning in the documentation: "Be careful when sorting arrays with mixed types values because sort() can produce unexpected results, if sort_flags is SORT_REGULAR" Commented Jun 1, 2020 at 23:19
  • 2
    @Marvin thanks, have not seen that warning myself (mostly because I was digging into array_unique and did not realize until yesterday that sort plays a major role there. I'd say this comment of yours answers about 50% of the whole question at hand Commented Jun 2, 2020 at 9:39

3 Answers 3

2
+100

For your issue at hand, I am really suspecting this is coming from the way array_unique is removing elements out of the array, when using the SORT_REGULAR flag, by:

  1. sorting it
  2. removing adjacent items if they are equal

And because you do have a Proxy object in the middle of your User collection, this might cause you the issue you are currently facing.

This seems to be backed up by the warning of the sort page of PHP documentation, as pointed out be Marvin's comment.

Warning Be careful when sorting arrays with mixed types values because sort() can produce unexpected results, if sort_flags is SORT_REGULAR.

Source: https://www.php.net/manual/en/function.sort.php#refsect1-function.sort-notes


Now for a possible solution, this might get you something more Symfony flavoured.

It uses the ArrayCollection filter and contains methods in order to filter the second collection and only add the elements not present already in the first collection.
And to be fully complete, this solution is also making use of the use language construct in order to pass the second ArrayCollection to the closure function needed by filter.

This will result in a new ArrayCollection containing no duplicated user.

public static function merge(Collection $a, Collection $b, bool $unique = false): Collection {
  if($unique){
    return new ArrayCollection(
      array_merge(
        $a->toArray(),
        $b->filter(function($item) use ($a){
          return !$a->contains($item);
        })->toArray()
      )
    );
  }

  return new ArrayCollection(array_merge($a->toArray(), $b->toArray()));
}
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, I'll take this as an accepted answer as it is the closest to what I'm after. This and @Marvin's comment about a warning when SORT_REGULAR sorting array of objects of different types. When digging through the PHP source code and internal docs, I got as far as object zval _zend_object_value and how it consists of handle and handler table, how each class can have different handler table, yet some classes can share some handlers, and I suppose for sort/array_unique to work, they need to share the 'compare' handler. I just couldn't find more info about where/what/which those handlers are.
I followed that question since the beginning but also only realized the sorting issues after @β.εηοιτ.βε's comments. So... well deserved bounty.
0

I know that you said that you don't want converting to string, but i see that you are yet no have way out, so i propose to you use the function serialize to each object in your array, i don't found a method to compare objects that isn't converting in array or string (you cant try convert in binary or hex if you don't unfamiliar with string or array, but i don't know if you can converting to binary or hex without to convert in string).

But, if you use serialize, you can serialize the object in a read data own of php, to you compare with anothers serialized objects, this method (serialize) is safe, because you can do aunserialize, and geting the original object again.

So you can serialize all elements from array and after this, you can use array_unique, like that:

<?php

header("Content-Type: application/json");

class MyClass
{
    public $var1;
    public $var2;
    function __construct($var1, $var2)
    {
        $this->var1 = $var1;
        $this->var2 = $var2;
    }

}

$arr = [
    "a",
    "a",
    [1,2,3],
    "b",
    [1,2,3],
    new MyClass(1,1),
    new MyClass(1,new MyClass(1,1)),
    new MyClass(1,new MyClass(1,1)),
];

$arrSerilized = array_map("serialize", $arr);

var_dump(
    array_map(
        "unserialize",
        array_unique(
            $arrSerilized,
            SORT_STRING
        )
    )
);

/* output:
array(5) {
    [0]=>
    string(1) "a"
    [2]=>
    array(3) {
        [0]=>
        int(1)
        [1]=>
        int(2)
        [2]=>
        int(3)
    }
    [3]=>
    string(1) "b"
    [5]=>
    object(MyClass)#6 (2) {
        ["var1"]=>
        int(1)
        ["var2"]=>
        int(1)
    }
    [6]=>
    object(MyClass)#7 (2) {
        ["var1"]=>
        int(1)
        ["var2"]=>
        object(MyClass)#8 (2) {
            ["var1"]=>
            int(1)
            ["var2"]=>
            int(1)
        }
    }
}
*/

Hope this help you man, have a good day!

P.S.: With serialize you can preserve same value in different variable type, like 1 and "1" are serialized in different read data of php

1 Comment

I don't think this will work in the OP's context. Symfony User is something sepcific that already have a serialize function in order to put the User connected in the session. If the OP have to make all fields of the User in the serialization and so in the session, that wouldn't really be ideal
0

Without knowing about your entity class its hard to guess why this is happening. But I guess your main issue here is __toString() method . If you have not defined it, you should add one such that it returns a unique/distinct string for each entity object. If its already defined make sure it returns distinct string.

class User{ 
   private $name;

   function __construct($name){ 
      $this->name=$name;
   }

   function __toString(){ 
     return $this->name; 
   }
}

$user = [];
$users[] = new User("User1");
$users[] = new User("User2");
$users[] = new User("User3");

$user1= $users[0];
$users[]=$user1; //duplicate

echo(count(array_unique($users))); // output should be 3

Given the limited information about entity class I can guess this far.

Edit:

After reading your edits I guess you are locking yourself into this. Since array_unique will try to convert an entity object to either string or number depending on the sort_flag you pass. More on array_unique. So either you need to implement __toString() or add some public properties which define the uniqueness of your object to entity e.g

class User{ 
       public $id;
       private $name;

       function __construct($id,$name){
          $this->id=$id;
          $this->name=$name;
       }
}

$user = [];
$users[] = new User(1,"User1");
$users[] = new User(2,"User2");
$users[] = new User(3,"User3");

$user1= $users[0];
$users[]=$user1; //duplicate
echo(count(array_unique($users, SORT_REGULAR))); // output should be 3

Please note the public property $id and SORT_REGULAR flag.

3 Comments

Your edit actually proves the point of the OP, array_unique, when two items are the same, using SORT_REGULAR should work, but it is not, for some reason
@β.εηοιτ.βε may be we need more info about User and its public properties.
@sakhunzai public properties do not seem to play a major role here, as two entries in the array are THE SAME object (unless you know something that I don't about how sort($items, SORT_REUGAL) works internally, especially with array of objects from 2 different classes)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.