1 year ago
#321803
Sonu
Presto datatype Mismatch Issue In Hive ORC Table
I am trying to query my hive orc table by presto ,In Hive its working Fine.In prestro I am able to access all the column except lowrange It's showing Below Erroe
error : Query 20220322_135856_00076_a33ec failed: Error opening Hive split hdfs://.....filename.orc
(offset=0, length=24216): Malformed ORC file. Cannot read SQL type varchar from ORC stream .lowrange
of type LONG [hdfs://.....filename.orc.orc]
I have set below property in presto before starting the query:
set hive1.orc.use-column-names=true
where hive1 is my catalog name.
I have also tried to change Hive tables datatype for this column as Double/BigInt,Int But Nothing Worked.
Can someone help me to resolve the error.
Table Description:
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| col_name | data_type | comment |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
| # col_name | data_type | comment |
| | NULL | NULL |
| lowrange | string | |
| type | string | |
| processed_date | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| type | string | |
| | NULL | NULL |
| # Detailed Table Information | NULL | NULL |
| Database: | test | NULL |
| Owner: | hdfs | NULL |
| CreateTime: | Tue Mar 22 08:28:49 UTC 2022 | NULL |
| LastAccessTime: | UNKNOWN | NULL |
| Protect Mode: | None | NULL |
| Retention: | 0 | NULL |
| Location: | hdfs://......../user/hdfs/test/ | NULL |
| Table Type: | EXTERNAL_TABLE | NULL |
| Table Parameters: | NULL | NULL |
| | EXTERNAL | TRUE |
| | skip.header.line.count | 1 |
| | transient_lastDdlTime | 1647937729 |
| | NULL | NULL |
| # Storage Information | NULL | NULL |
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL |
| InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL |
| OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
| Compressed: | No | NULL |
| Num Buckets: | -1 | NULL |
| Bucket Columns: | [] | NULL |
| Sort Columns: | [] | NULL |
| Storage Desc Params: | NULL | NULL |
| | field.delim | , |
| | serialization.format | , |
+-------------------------------+---------------------------------------------------------+-----------------------+--+
Sample Data:
lowrange type processed_date
1234567890001212 01 20220323
1234567890001213 01 20220323
Table Create Statement:
CREATE EXTERNAL TABLE `table1`(
`lowrange` string,
`processed_date` string)
PARTITIONED BY (
`type` string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://......./user/hdfs/test'
TBLPROPERTIES (
'skip.header.line.count'='1')
Update: I dropped the existing Table and created new table with datatype from String to BigINt in hive table and able to select data from Table but When I am trying to perform lpad operation its again showing same issue.
Logic I want to apply on Field : lpad(lowrange ,13,'9')
Error: Unexpected parameters (bigint, integer, varchar(1)) for
function lpad. Expected: lpad(varchar(x), bigint, varchar(y))
then I tried to cast bigint to varchar using Below query:
Updated Logic : lpad(cast(lowrange as varchar),13,'9'))
Error:
Malformed ORC file. Cannot read SQL type bigint
from ORC stream .lowrange of type STRING
hive
hiveql
presto
presto-jdbc
0 Answers
Your Answer