MySQL GROUP_CONCAT duplicate entries

I have a problem with my mysql query, since it returns duplicate values for grouped attributes. For example, for the attribute q_id I receive 1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2 instead of 1_-_-_-_-_2 as expected. Two things are confusing.

  1. The 1's and 2's get repeated each 8 times. This is probably because there are 8 different keywords associated with the publication.
  2. The repetition of 1's and 2's occurs 2 times. This is due to 2 different authors associated with the publication.

My query.

SELECT 
    sm_publications.id AS p_id,
    GROUP_CONCAT(sm_authors.last_name SEPARATOR '_-_-_-_-_') AS a_name,
    GROUP_CONCAT(sm_affiliations.display_name SEPARATOR '_-_-_-_-_') AS af_display_name,
    GROUP_CONCAT(sm_keywords.name SEPARATOR '_-_-_-_-_') AS k_name,
    GROUP_CONCAT(sm_query_publications.query_id SEPARATOR '_-_-_-_-_') AS q_id
FROM sm_publications 
INNER JOIN sm_publication_authors ON sm_publication_authors.publication_id = sm_publications.id
INNER JOIN sm_authors ON sm_authors.id = sm_publication_authors.author_id
LEFT JOIN sm_affiliations ON sm_affiliations.id = sm_authors.affiliation_id
LEFT JOIN sm_publication_keywords ON sm_publication_keywords.publication_id = sm_publications.id
LEFT JOIN sm_keywords ON sm_keywords.id = sm_publication_keywords.keyword_id
INNER JOIN sm_query_publications ON sm_query_publications.publication_id = sm_publications.id 
WHERE sm_publications.id IN (1,2)  /* Just as example */
GROUP BY sm_publications.id

The relationships can be seen in the following ERM.

There are several characteristics.

  1. One publication has to have authors, but keywords are not mandatory.
  2. One author can have an affiliation, but it is not mandatory.
  3. One publication has to refer to one or more queries.

QUESTION: How can I join all the entities without receiving duplicate attributes? I know, that there is something like GROUP_CONCAT(DISTINCT [...]), but it leads to a problem that if there are two authors from the same affiliation, I only get one affiliation back. But in this case, I would like to receive both affiliations, although they are the same.


An example output looks like.

[
{
    "p_id": 1,
    "a_name": 
    "Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Wang_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen_-_-_-_-_Chen",
    "af_display_name": "North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_North China Electric Power University_-_-_-_-_Huazhong University of Science & Technology_-_-_-_-_Huazhong University of Science & Technology_-_-_-_-_Huazhong University of Science & Technology_-_-_-_-_Huazhong University of Science & Technology_-_-_-_-_Huazhong University of Science & Technology_-_-_-_-_Huazhong University of Scien",
    "k_name": "Genetic Algorithm_-_-_-_-_High Efficiency_-_-_-_-_Improved Genetic Algorithm_-_-_-_-_Rule Extraction_-_-_-_-_Data Mining_-_-_-_-_Explicit Knowledge_-_-_-_-_Artificial Neural Network_-_-_-_-_Neural Network_-_-_-_-_Genetic Algorithm_-_-_-_-_High Efficiency_-_-_-_-_Improved Genetic Algorithm_-_-_-_-_Rule Extraction_-_-_-_-_Data Mining_-_-_-_-_Explicit Knowledge_-_-_-_-_Artificial Neural Network_-_-_-_-_Neural Network_-_-_-_-_Genetic Algorithm_-_-_-_-_High Efficiency_-_-_-_-_Improved Genetic Algorithm_-_-_-_-_Rule Extraction_-_-_-_-_Data Mining_-_-_-_-_Explicit Knowledge_-_-_-_-_Artificial Neural Network_-_-_-_-_Neural Network_-_-_-_-_Genetic Algorithm_-_-_-_-_High Efficiency_-_-_-_-_Improved Genetic Algorithm_-_-_-_-_Rule Extraction_-_-_-_-_Data Mining_-_-_-_-_Explicit Knowledge_-_-_-_-_Artificial Neural Network_-_-_-_-_Neural Network"
    "q_id": "1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2"
}, 
{
    "p_id": 2,
    "a_name": "Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai_-_-_-_-_Mihai",
    "af_display_name": "University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti_-_-_-_-_University of Pitesti",
    "k_name": "Web Content Mining_-_-_-_-_Web Structure Mining_-_-_-_-_Web Mining_-_-_-_-_E Commerce_-_-_-_-_Data Preprocessing_-_-_-_-_Cause Related Marketing_-_-_-_-_Web Usage Mining_-_-_-_-_Data Mining_-_-_-_-_Web Content Mining_-_-_-_-_Web Structure Mining_-_-_-_-_Web Mining_-_-_-_-_E Commerce_-_-_-_-_Data Preprocessing_-_-_-_-_Cause Related Marketing_-_-_-_-_Web Usage Mining_-_-_-_-_Data Mining",
    "q_id": "1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_1_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2_-_-_-_-_2"
}

]

Answers


If you could supply a bit of sample data and the table declares it would be possible to test this.

However I think you problem is due to multiple records on each of the joined tables, with the joins then giving you every possible combination which results in duplicates in the GROUP_CONCAT data.

Joining sub queries might work:-

SELECT 
    sm_publications.id AS p_id,
    author_inf.a_name,
    affiliations_inf.af_display_name,
    keyword_inf.k_name,
    publications_inf.q_id
FROM sm_publications 
INNER JOIN
(
    SELECT sm_publication_authors.publication_id, GROUP_CONCAT(sm_authors.last_name SEPARATOR '_-_-_-_-_') AS a_name
    FROM sm_publication_authors 
    INNER JOIN sm_authors ON sm_authors.id = sm_publication_authors.author_id
    GROUP BY sm_publication_authors.publication_id
) author_inf
ON author_inf.publication_id = sm_publications.id
LEFT OUTER JOIN
(
    SELECT sm_publication_authors.publication_id, GROUP_CONCAT(sm_affiliations.display_name SEPARATOR '_-_-_-_-_') AS af_display_name
    FROM sm_publication_authors 
    INNER JOIN sm_authors ON sm_authors.id = sm_publication_authors.author_id
    LEFT JOIN sm_affiliations ON sm_affiliations.id = sm_authors.affiliation_id
    GROUP BY sm_affiliations.publication_id
) affiliations_inf
ON affiliations_inf.publication_id = sm_publications.id
LEFT OUTER JOIN
(
    SELECT sm_publication_keywords.publication_id, GROUP_CONCAT(sm_keywords.name SEPARATOR '_-_-_-_-_') AS k_name
    FROM sm_publication_keywords 
    LEFT JOIN sm_keywords ON sm_keywords.id = sm_publication_keywords.keyword_id
    GROUP BY sm_publication_keywords.publication_id
) keyword_inf
ON keyword_inf.publication_id = sm_publications.id
INNER JOIN
(
    SELECT sm_query_publications.publication_id, GROUP_CONCAT(sm_query_publications.query_id SEPARATOR '_-_-_-_-_') AS q_id
    FROM sm_query_publications
    GROUP BY sm_query_publications.publication_id
) publications_inf
ON publications_inf.publication_id = sm_publications.id
WHERE sm_publications.id IN (1,2)  /* Just as example */

Need Your Help

How can I automatically hide an NSScroller in Lion?

objective-c macos cocoa osx-lion nsscroller

I have an NSScroller that is used to scroll some complete custom view of mine. With [scroller setScrollerStyle:[NSScroller preferredScrollerStyle]] I get the correct style of the scroller. Now when...

Xamarin Debugger won't attach to iPad

ios ipad debugging xamarin xamarin.ios

When trying to debug on device using profile "Debug|iPhone > Casey's iPad" xamarin first shows this at the top: