Return all records where a field has the same value plus additional text

A table contains unique records for a specific field, (FILENAME). Although the records are unique, really they are just duplicates that only have some text appended. How can you return and group similar or like records and update the empty fields?

The table below is typical of the records. Every record has a file name but it is not a key field. There is one database record with metadata that I would like to populate to document metadata that is only identifiable by the first n characters.

The variable is the original file name is always changing character lengths. The constant is that the prefix is always the same.

FILENAME     /    DWGNO       /    PROJECT
52349        /     52349      /     Ford
52349-1.dwg  /                / 
52349-2.DWG  /                / 
52349-3.dwg  /                / 
52351        /       52351    /      Toyota
52351_C01_REV-   /                / 
52351_C01_REV2-  /                / 
123          /        123     /       Nissan
123_rev1     /                /     
123_rev2     /                /     
123_rev3     /                /     

The table should look like this.

FILENAME       /      DWGNO     /     PROJECT
52349          /      52349     /      Ford
52349-1.dwg    /      52349     /      Ford
52349-2.DWG    /      52349     /      Ford
52349-3.dwg    /      52349     /      Ford
52351          /      52351      /      Toyota
52351_C01_REV- /      52351      /      Toyota
52351_C01_REV2-/      52351      /      Toyota
123            /      123        /      Nissan
123_rev1       /      123        /      Nissan
123_rev2       /      123        /      Nissan
123_rev3       /      123        /      Nissan

I first tried to join the table on itself and check the length but "LEFT(FILENAME, 10)" is not returning all the results.

USE MyDatabase
SELECT      x.DWGNO AS X_DWGNO,
    y.DWGNO AS Y_DWGNO,
    x.FILENAME AS X_FILENAME
    y.FILENAME AS Y_FILENAME
    x.DWGTITLE,
    x.REV,
    x.PROJECT
FROM        dbo.DocShare x
-- want all the files from the left table... I think
LEFT JOIN   dbo.DocShare y
 ON     LEFT(FILENAME LEN(CHARINDEX('.', FILENAME 1))) = LEFT(FILENAME, 10)

Also tried something else based on a similar post, but it doesn't really work either.

USE MyDatabase
SELECT      X.E_DWGNO,
    y.DWGNO AS Y_DWGNO,
    x.FILENAME AS X_FILENAME
    y.FILENAME AS Y_FILENAME
    x.DWGTITLE,
    x.REV,
    x.PROJECT
FROM        dbo.DocShare x
WHERE EXISTS(SELECT x.FILENAME 
        FROM dbo.DocShare
        WHERE x.FILENAME = LEFT(y.FILENAME LEN(CHARINDEX('.', y.FILENAME, 0))))
ORDER BY y.FILENAME 

Answers


Try this

Sql Fiddle

select f2.Filename,f1.DWGNO,f1.Project
from File1 f2 left join File1 f1 on
f2.Filename like f1.Filename+'%'
where f1.DWGNO != '' 

First, you want to get the base files or those rows where DWGNO IS NOT NULL. Then, get the revisions (DWGNO IS NULL) and do a JOIN on the base files:

SQL Fiddle

WITH CteBase AS (
    SELECT * FROM Tbl WHERE DWGNO IS NOT NULL
),
CteRev AS(
    SELECT
        t.FileName,
        DWGNO = cb.DWGNO,
        Project = cb.Project
    FROM Tbl t
    INNER JOIN CteBase cb
        ON t.FileName LIKE cb.FileName + '%'
    WHERE t.DWGNO IS NULL
)
SELECT * FROM CteBase
UNION ALL
SELECT * FROM CteRev
ORDER BY FileName

Using like % may result in incorrect data if there is a file name like 523510 as it is like 52351%. Try the below

USE MyDatabase
SELECT      x.DWGNO AS X_DWGNO,
    y.DWGNO AS Y_DWGNO,
    x.FILENAME AS X_FILENAME
    y.FILENAME AS Y_FILENAME
    x.DWGTITLE,
    x.REV,
    x.PROJECT
FROM        dbo.DocShare x
-- want all the files from the left table... I think
LEFT JOIN   dbo.DocShare y
 ON     left(y.[FileName],PATINDEX('%[^0-9]%', y.[FileName])-1) = x.[FILENAME]

I'm assuming the inner queries are scalar.

Basically it uses patindex() to find a non-digit character. (I think I've got the function calls right.) We really don't need to update rows that don't include one of those. For those that do we need to do a look up the row that has the matching prefix as its complete filename. That prefix is all the characters prior to the return value of patindex().

update dbo.DocShare
set DWGNO = (
        select DWGNO
        from dbo.DocShare as ds
        where ds.FILENAME =
            left(
                dbo.DocShare.FILENAME,
                patindex('%[^0-9]%', dbo.DocShare.FILENAME + '_') - 1
            )
    ),
    PROJECT
        select PROJECT 
        from dbo.DocShare as ds
        where ds.FILENAME =
            left(
                dbo.DocShare.FILENAME,
                patindex('%[^0-9]%', dbo.DocShare.FILENAME + '_') - 1
            )
)
where patindex('%[0-9]%', FILENAME + '_') > 0

Need Your Help

The configuration section 'appSettings' cannot be read because it is missing a section declaration

asp.net visual-studio web-config

I Was clearing ASP.NET Temporary Internet files and I accidentally deleted some .NET Framework files (I have 3.0, 3.5, 4.0 and 4.5) then neither Visual Studio 10 or Visual Studio 2012 was opening.

What do the X and Y in topLeftRadiusX and topLeftRadiusY on a border do?

svg graphics vector-graphics fxg

In some graphics programs there are corner radius values that are cornerRadiusX and cornerRadiusY. I have even seen topLeftRadiusX and topLeftRadiusY plus six more properties; 2 properties per corn...