distinct on a single column, yet returning a row from the same table based on an unselected condition
I am new to Sql server, and my MySql background is only decent, nothing exceptional.
I have a table with 5 columns:
Neighbrhood City State zip_code zip_percent
The goal of the query: To provide a list of neighborhoods (unique) within a city and the zip_code that corresponds with the highest zip_percent. Thus returning 2 columns:
Where the zip_code is the "most representative" of the neighborhood, by virtue of being of the highest percent coverage or "zip_percent"
Issues: Since a neighborhood can overlap multiple zip_codes I am having an issue getting them distinctly. The distinct clause prevents me from ordering by an unselected column.
Somehow I think I need to take this subquery and join it back onto the original table using a MAX command on the zip_percent column. I just can't quite put all the pieces together. I would really appreciate any help.
You can see an example of the issue here with Neighborhoods like 'Arbor Heights' and 'Rainer Beach'
Try using ROW_NUMBER:
SELECT Neighborhood, zip_code FROM ( SELECT Neighborhood, zip_code, ROW_NUMBER() OVER (PARTITION BY Neighborhood ORDER BY zippercent DESC) AS rn FROM neighbourhoods_zip ) AS T1 WHERE rn = 1
Neighborhood zip_code Adams 98107 Alki 98116 ArborHeights 98146 Atlantic 98144 Belltown 98121 BitterLake 98133 Briarcliff 98199 etc...
SELECT Q1.Neighborhood, Q1.Zip FROM TableName AS Q1 LEFT JOIN (SELECT Neighborhood, MAX(zip_percent) as Zip_Percent FROM TableName GROUP BY Neighborhood) AS Q2 ON Q1.Zip_Percent = Q2.Zip_Percent AND Q1.Neighborhood = Q2.Neighborhood
Keep in mind though that this could have unexpected results in Neighborhoods where the max zip percentage is shared by many zips. e.g.
+------------------------------------+ | NeighborHood | Zip | Zip Percent | +------------------------------------+ | A | 91345 | 50 | +--------------+-------+-------------+ | A | 91346 | 50 | +--------------+-------+-------------+
You need to do a group by clause in order for distinct to work.
DISTINCT does not work on a column basis it works on a row.
So what you want to do is a sub query to get the max zippercent.
SELECT MAX(ZipperCent) AS TheMax FROM YourTable GROUP BY ZipCode
This will give you the max zippercent based on a zip code, remember that a zip code can span multiple postal_names. For instance 98107 is seattle but so is 98116. So you do not want to group by the city name, group by the zip code.
Now that you have the max zippercents per zip code you simply use that as a sub query to grab the other information you need:
SELECT YourOtherField1, YourOtherField2, t.TheMax FROM YourTable INNER JOIN (SELECT MAX(ZipperCent) AS TheMax FROM YourTable GROUP BY ZipCode) t ON t.YourPrimaryKey = YourTable.PrimaryKey
It would be best if you had an autoincrement ID field on the table as well to easily handle cases where there's more than one row with the same max(zip_percent) per nighborhood. Minus that point, this should be approximately correct:
with max_zp as select Neighborhood, max(zip_percent) zp from tbl group by Neighborhood select t.NeighborHood, t.zipcode from tbl t join max_zp mz on t.Neighborhood = mz.Neighborhood and t.zip_percent = mz.zp
Select Neighbrhood, zipcode FROM geoTable g INNER JOIN ( SELECT zip_code, MAX(zip_percent) zip_percent FROM geoTable GROUP BY zipcode ) maxG ON g.zip_code = maxG.zip_code and g.zip_percent = maxG.zip_percent