2015년 1월 6일 화요일

[impala-user] support of “QUALIFY ROW_NUMBER() OVER(...)”

Is there any approach to support the QUALIFY SQL key word, or any workaround?

Like the SQL statement below,
SELECT id, col1, col2, ROW_NUMBER() OVER(PARTITION BY col3 ORDER BY col1 ASC) AS Rwn FROM table_name QUALIFY Rwn=1;

Currently impala supports the row_number() over(...) analytic function,

However, if I want to filter the result of the row_number() over(...), is there any approach or workaround?

I tried to replace the QUALIFY with WHERE
SELECT id, col1, col2, ROW_NUMBER() OVER(PARTITION BY col3 ORDER BY col1 ASC) AS Rwn FROM table_name WHERE Rwn=1;

I got exception "AnalysisException: couldn't resolve column reference: Rwn" 

It seems that the code  public SlotDescriptor registerColumnRef(TableName tblName, String colName) is checking the specific column_name from table, obviouslly can not find the Rwn, and then throw the exception when parse the SQL.

Anyway, is there any approach / workaround to support QUALIFY key word?



The where clause function is working fine for me as a substitution for "QUALIFY".Could you check if the RowNum function alone is working for you.



Thanks Pranay, the RowNum function alone is working well.

it seems that the analytic function can only work with select and order by now
"Analytic function calls are only allowed in the SELECT list and in the outermost ORDER BY clause of the query."

Also by checking the code, when analyze the SQL statement, the where clause can not contain analytic expressions.
public void analyze(Analyzer analyzer) throws AnalysisException  
229       Expr e = whereClause_.findFirstOf(AnalyticExpr.class);
230       if (e != null) {
231         throw new AnalysisException(
232             "WHERE clause must not contain analytic expressions: " + e.toSql());
233       }

However, I am thinking whether can work if utilize the alias, and put the analytic function alias in where clause. But it failed in my test...

And I am not sure whether there are any other workaround way if want to achieve similar goal.

btw, I am using the version 2.0. Which version you used?



I tried the following  with the rank as an alias in the inner query :

SELECT 
            Co1, 
   Col2,
              FROM  
  (
           SELECT
                    Col1,
                    Col2,
                    CASE WHEN MAX(TNS_8Wk_Amt) OVER (PARTITION BY Consumer_Id) <> 0 
                    THEN RANK() OVER (PARTITION BY Col1 ORDER BY Col2 DESC)
     
                    ELSE RANK() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) END AS RANK
                  FROM
                  Table1;
                ) r where RANK=1

I am using Impala version 2.0



Thanks Pranay, yes it works if utilize the sub-query! This could be a workaround way now :)


댓글 없음:

댓글 쓰기