What is wrong with `std::set`?

In the other topic I was trying to solve this problem. The problem was to remove duplicate characters from a std::string.

std::string s= "saaangeetha";

Since the order was not important, so I sorted s first, and then used std::unique and finally resized it to get the desired result:

aeghnst

That is correct!


Now I want to do the same, but at the same time I want the order of characters intact. Means, I want this output:

sangeth

So I wrote this:

template<typename T>
struct is_repeated
{
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    s.erase(std::remove_if(s.begin(), s.end(), is_repeated<char>()), s.end()); 
    std::cout << s ;
}

Which gives this output:

saangeth

That is, a is repeated, though other repetitions gone. What is wrong with the code?

Anyway I change my code a bit: (see the comment)

template<typename T>
struct is_repeated
{
    std::set<T> & unique;  //made reference!
    is_repeated(std::set<T> &s) : unique(s) {} //added line!
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    std::set<char> set; //added line!
    s.erase(std::remove_if(s.begin(),s.end(),is_repeated<char>(set)),s.end()); 
    std::cout << s ;
}

Output:

sangeth

Problem gone!

So what is wrong with the first solution?

Also, if I don't make the member variable unique reference type, then the problem doesn't go.

What is wrong with std::set or is_repeated functor? Where exactly is the problem?

I also note that if the is_repeated functor is copied somewhere, then every member of it is also copied. I don't see the problem here!

Answers


In GCC (libstdc++), remove_if is implemented essentially as

    template<typename It, typename Pred>
    It remove_if(It first, It last, Pred predicate) {
      first = std::find_if(first, last, predicate);
    //                                  ^^^^^^^^^
      if (first == last)
         return first;
      else {
         It result = first;
         ++ result;
         for (; first != last; ++ first) {
           if (!predicate(*first)) {
    //          ^^^^^^^^^
              *result = std::move(*first);
              ++ result;
           }
         }
      }
    }

Note that your predicate is passed by-value to find_if, so the struct, and therefore the set, modified inside find_if will not be propagated back to caller.

Since the first duplicate appears at:

  saaangeetha
//  ^

The initial "sa" will be kept after the find_if call. Meanwhile, the predicate's set is empty (the insertions within find_if are local). Therefore the loop afterwards will keep the 3rd a.

   sa | angeth
// ^^   ^^^^^^
// ||   kept by the loop in remove_if
// ||
// kept by find_if

Functors are supposed to be designed in a way where a copy of a functor is identical to the original functor. That is, if you make a copy of one functor and then perform a sequence of operations, the result should be the same no matter which functor you use, or even if you interleave the two functors. This gives the STL implementation the flexibility to copy functors and pass them around as it sees fit.

With your first functor, this claim does not hold because if I copy your functor and then call it, the changes you make to its stored set do not reflect in the original functor, so the copy and the original will perform differently. Similarly, if you take your second functor and make it not store its set by reference, the two copies of the functor will not behave identically.

The reason that your final version of the functor works, though, is because the fact that the set is stored by reference means that any number of copies of tue functor will behave identically to one another.

Hope this helps!


Not really an answer, but as another interesting tidbit to consider, this does work, even though it uses the original functor:

#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

template<typename T>
struct is_repeated {
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 
int main() {
    std::string s= "saaangeetha";
    std::remove_copy_if(s.begin(), s.end(), 
                        std::ostream_iterator<char>(std::cout), 
                        is_repeated<char>());
    return 0;
}

Edit: I don't think it affects this behavior, but I've also corrected a minor slip in your functor (operator() should apparently take a parameter of type T, not char).


I suppose the problem could lie in that the is_repeated functor is copied somewhere inside the implementation of std::remove_if. If that is the case, the default copy constructor is used and this in turn calls std::set copy constructor. You end up with two is_repeated functors possibly used independently. However as the sets in both of them are distinct objects, they don't see the mutual changes. If you turn the field is_repeated::unique to a reference, then the copied functor still uses the original set which is what you want in this case.


Functor classes should be pure functions and have no state of their own. See item 39 in Scott Meyer's Effective STL book for a good explanation on this. But the gist of it is that your functor class may be copied 1 or more times inside the algorithm.


The other answers are correct, in that the issue is that the functor that you are using is not copyable safe. In particular, the STL that comes with gcc (4.2) implements std::remove_if as a combination of std::find_if to locate the first element to delete followed by a std::remove_copy_if to complete the operation.

template <typename ForwardIterator, typename Predicate>
std::remove_if( ForwardIterator first, ForwardIterator end, Predicate pred ) {
   first = std::find_if( first, end, pred ); // [1]
   ForwardIterator i = it;
   return first == last? first 
          : std::remove_copy_if( ++i, end, fist, pred ); // [2]
}

The copy in [1] means that the first element found is added to the copy of the functor and that means that the first 'a' will be lost in oblivion. The functor is also copied in [2], and that would be fine if it were not because the original for that copy is an empty functor.


Depending on the implementation of remove_if can make copies of your predicate. Either refactor your functor and make it stateless or use Boost.Ref to "for passing references to function templates (algorithms) that would usually take copies of their arguments", like so:

#include <set>
#include <iostream>
#include <string>
#include <algorithm>
#include <iterator>

#include <boost/ref.hpp>
#include <boost/bind.hpp>

template<typename T>
struct is_repeated {
    std::set<T>  unique;
    bool operator()(T c) { return !unique.insert(c).second; }
}; 

int main() {
    std::string s= "saaangeetha";
    s.erase(std::remove_if(s.begin(), s.end(), boost::bind<bool>(boost::ref(is_repeated<char>()),_1)), s.end());
    std::cout << s;

    return 0;
}

Need Your Help

Why am I having issues assigning a Range to an Array of Variants

arrays excel vba excel-vba

I am having a few problems with some VERY simple lines of code. Let me detail the facts and see if anyone else can replicate this behavior. If any one can replicate I would like to get an explanati...

How to remove Modules from a Intellij Maven Project permanently?

java maven intellij-idea dependency-management

I am currently working on a larger scale Maven-based project in IntelliJIdea 12.1.6 Ultimate. I have been working with IntelliJIdea since about 5 months.