Sunday, April 9, 2017

MySQL Order inside Groups




The issue at hand is: I have a table codeName, where code is constant, but name could be changed. So there are occasions where a code can have multiple names in this table, thus the need to delete duplicated names. It is easy to get the code having multiple names:

select code, count(*)
from codeName
group by code
having count(*)>1

The difficult part is to leave the newest name in the table while removing all other names. There is a column “date” which specifies when a name is entered into the table, so the requirement is to leave the name with the newest “date” in the table. To find out the code with the latest naming date: 

select code, date, count(*)
from(
    select * from codeName
    order by code, date desc
  )a
group by code
having count(*)>1

(It is interesting that MYSQL allows to have fields in the select clause that are not in the grouping functions, in the above sql, date is not in the group by clause,  and yet it is allowed. I recall other relational database doesn’t support this?)

The nature of the problem distil to: how to number the records in each groups. If this can be done, than the solution is as easy as to delete all the records except the first one.

Other relational databases (Oracle e.g.) have analytical functions that can do this, MYSQL doesn’t have it. The trick is to use session variables:

delete b.* from codeName b
inner join (
    select * from (
      SELECT code, name, date,
           @rank := IF(@code = code, @rank + 1, 1) AS rank,
           @code := code
      FROM codeName
      ORDER BY code, date DESC
    )a
      where a.rank!=1
)c
where b.code=c.code and b.date=c.date

@rank, @code are session variables, you do not need to declare them beforehand. @rank := IF(@code = code, @rank + 1, 1) AS rank functions just as a simple if statement,  the first time it was invoked, @code is NULL, so @rank is assigned 1; Since the query is order by code, date desc, if there are multiple names for the same code, @rank will be incremented.

Another interesting MYSQL syntax is on delete b.* from, it is sytax-wrong to say delete from.

In the above example, session variables function just as a simple if procedure, but apparently, it is more complicated than that, e.g.

select @rank:=IF(@rank is not NULL, @rank+1, 1 ) as rank, name from codeName
order by name

select @rank

This shows @rank=1, which is incorrect.

Open another session:
set @rank=null

select @rank:=IF(@rank is not NULL, @rank+1, 1 ) as rank, name from codeName
order by name

select @rank

This shows @rank=3172,which is correct.

The only difference is that the second one declare @rank first, but it declares it to be null, so it shouldn’t matter?

Another tricky thing is that when using sqlalchemy, apparently, you have to declare session variables first, otherwise, you will get the result as in the first example (e.g. @rank will always be 1). So in the original problem, there are two session variables which should be declared separately:

connection = engine.connect()
 
sql0="set @rank=0"

sql1="set @code=''"
 
connection.execute(sql0)

connection.execute(sql1)
 
connection.close()
 
 


No comments:

Post a Comment