remove column if duplicate data

tyrantdeath · August 9, 2011

i would like to ask how to remove duplicate data on ireport

SAMPLE DATA

Driver    Destination    Transport

carl        new york          car
carl        new jersey      motorcycle
carl      new york          motorcycle

SAMPLE OUTPUT

Driver Destination Transport

carl new york car

lukmanars · August 9, 2011

i think your data isn't duplicate,
it look like duplicate, but in column transport
there's data that make your data different from before

Driver    Destination    Transport
carl      new york          motorcycle
carl        new york          car

i think the problem is in your queries,

regards

Lukman

tyrantdeath · August 9, 2011

lukmanars
Wrote:

i think your data isn't duplicate,
it look like duplicate, but in column transport
there's data that make your data different from before

Driver    Destination    Transport
carl      new york          motorcycle
carl        new york          car

i think the problem is in your queries,

regards

Lukman

but is there a way to have the output that way?

cbarlow3 · August 11, 2011

If you're going to define "duplicate" data as meaning that you want to suppress any rows that duplicate a value for a single column, then yes there's a way to do it...at least there is if your version of SQL supports "window functions" (DB2 and SQL Server do). Using your example, the goal is to only return one row maximum per driver...a strange sort of request, since you'll obviously be missing information about the other rows where that driver appears. Let me give you two solutions, actually. Both will assume that the field "driver" really is a unique key so two drivers named "carl" wont' be a problem (I happen to be sensitive to this, since MY name is Carl, and I happen to be the "third" (the son of Carl Jr....no, not THAT Carl Jr.!)).

So my first solution supposes that there is another column called "destination_date" and that the reason you only want to return one row per driver is because you are looking for each driver's NEXT (earliest) destination. We'll also assume that the table only has destinations in the future so I don't have to introduce logic for skipping destinations that are in the past. This solution is simple:

SELECT driver, destination, transport, destination_date FROM (

SELECT driver, destination, transport, destination_date, min(destination_date) over (partition by driver) as min_date

FROM MyTable

) where destination_date=min_date

My second solution supposes that you don't conveniently have a logical method of choosing which "duplicate" record to display and that you arbitrarily going to pick one based on whatever order the rows are returned. In this case, you make up TWO fields rather than one: one is a simple row_number() function and the other is again the MIN() of that row_function over the partition of driver:

SELECT driver, destination, transport FROM (

SELECT driver, destination, transport, row_number() over (order by driver) as rownum,

min(row_number() over (order by driver)) over (partition by driver) as minrow FROM MyTable

) where rownum=minrow

As you see, either solution depends on the ability to use the "window" function OVER. Reworking the first example to use a subquery instead of windowing is a lot messier:

SELECT
driver, destination, transport, destination_date
FROM
MyTable INNER JOIN
(
SELECT
driver,
MIN(destination_date) AS min_date
FROM
MyTable
GROUP BY
driver
) x ON
MyTable.dirver=x.driver
WHERE
destination_date=x.min_date

I hope that helps!

Carl (III)

Sign In

remove column if duplicate data

Recommended Posts

tyrantdeath

Link to comment

Share on other sites

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

lukmanars

Link to comment

Share on other sites

tyrantdeath

Link to comment

Share on other sites

cbarlow3

Link to comment

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Activity

Products

Explore