Hive进阶函数:inline() 和 struct() ,一列转多行

一、使用场景

如果存在一张表,记录的是每位学生的各科成绩,现在想把表转换为纵向存储

比如:

name|english|math|history

tom |80 |90 |100

转换为:

name|subject|score

tom |english|80

tom |math |90

tom |history |100

二、问题分析

可以把这个问题分为两部分

第一部分:将一行转为三行

第二部分:将每行数据的分数和科目填上

三、问题解决

方法一:space()

所以可以使用space把每行数据都炸裂为三行,然后根据name开窗排序,依次写入成绩和科目即可

with base as (
  select 'tom' as name, 80 as english, 90 as math, 100 as history
  union all 
  select 'jery' as name, 30 as english, 60 as math, 70 as history
)
select 
  name 
  ,case when rn = 1 then 'english'
    when rn = 2 then 'math'
    when rn = 3 then 'history'
  end as subject 
  ,case when rn = 1 then english
    when rn = 2 then math 
    when rn = 3 then history
  end as score
from (
  select 
    name
    ,english
    ,math 
    ,history
    ,row_number() over(partition by name) rn 
  from base 
  lateral view explode(split(space(2), ' ')) tmp as s 
) t 

这样写纵然可以,但是很麻烦,消耗资源,并且不健壮,如果再来几个字段,是不是得一直添加下去,所以需要一个函数的出现

方法二:stack()

引入函数

stack函数:可以将多个列的值转为多行,并且搭配lateral view使用时,还可以充当虚拟表

SELECT stack(n, val1, val2, ..., valn) AS (col1, col2, ..., coln) FROM tbl

和lateral view搭配使用时

SELECT stacked_col
FROM tbl
LATERAL VIEW stack(2, name, score) stacked_table AS stacked_col;

stack函数传入的参数可以分为两部分

第一部分:需要列转行的个数,比如2,就是需要把两个列的值转为行显示

第二部分:需要转的列

最开始提出的问题,将分数按照科目和分数的形式展示,会存在一个问题,列转为行后,对应的科目应该怎么办?怎么知道每个分数的科目是什么呢?所以在这里需要做一点小小的转换,在传入列值的时候,把科目和分数组合传入

with base as (
  select 'tom' as name, 80 as english, 90 as math, 100 as history
  union all 
  select 'jery' as name, 30 as english, 60 as math, 70 as history
)

select 
  name
  ,split(score, ',')[0] as subject
  ,split(score, ',')[1] as score
from base 
lateral view stack(3, concat('english,',english), concat('math,',math), concat('history,',history)) tmp as score

这样即可,是不是比之前的方法简单了很多,而且更加的简洁

方法三:inline()

上面的方法虽然简单,但是需要把科目和分数拼接起来,然后在外面切割开使用,所以有没有另一种方法,不需要这样处理,直接使用原字段

引入函数

inline():inline函数用于将数组类型的列转换为多行。它会将数组中的每个元素与原始行的其他列一起展开为多行数据。

举例

with base as (
  select 'tom' as name, array(struct('history',10),struct('english',20),struct('history',30)) as scores
)

select name, subject, score from base 
lateral view inline(scores) as subject,score

这个例子就可以把列炸成三行,包括name、subject、score三个字段

注意,inline函数的输入参数要求为array(struct<>)类型,所以要保证类型匹配,否则会报错

cannot resolve 'inline(base.`scores`)' due to data type mismatch: input to function inline should be array of struct type, not array<int>

所以最初的问题又可以使用inline解决

with base as (
  select 'tom' as name, 80 as english, 90 as math, 100 as history
  union all 
  select 'jery' as name, 30 as english, 60 as math, 70 as history
)

select 
  name 
  ,subject
  ,score
from base 
lateral view inline(array(
  struct('english', english),
  struct('math', math),
  struct('history', history)
)) tmp as subject, score

结果

name    subject    score
tom     english    80
tom     math       90
tom     history    100
jery    english    30
jery    math       60
jery    history    70