Parquet
- apache/parquet-format
- Apache-2.0, Thrift
- 使用 Thrift 来定义和序列化它的文件元数据
- dictionary
- bloom filter
- delta binary packed compression
- 参考
- Small Materialized Aggregates: A Light Weight Index Structure for Data Warehousing
- Super-Scalar RAM-CPU Cache Compression
- NodeJS
- hyparam/hyparquet
- MIT, JS
- npm:hyparquet
- LibertyDSNP/parquetjs
- MIT, TS, JS
- npm:@dsnp/parquetjs
ironSource/parquetjs- npm:parquetjs
- hyparam/hyparquet
Type | Size | Description |
---|---|---|
BOOLEAN | 1 bit | Boolean value |
INT32 | 32 bit | Signed integer |
INT64 | 64 bit | Signed integer |
INT96 | 96 bit | Signed integer (deprecated) |
FLOAT | 32 bit | IEEE floating point |
DOUBLE | 64 bit | IEEE floating point |
BYTE_ARRAY | varying | Variable length byte array |
FIXED_LEN_BYTE_ARRAY | fixed | Fixed length byte array |
Type | Physical Type | Description |
---|---|---|
String | ||
STRING | BYTE_ARRAY | UTF8 encoded character string |
ENUM | ||
UUID | FIXED[16] | 16-byte UUID |
Numeric | ||
INT(bits, signed) | ||
DECIMAL | ||
FLOAT16 | FIXED[2] | IEEE 754-2008 16-bit floating point number |
Temporal | ||
DATE | int32 | Days from Unix epoch |
TIME(utc,unit) | INT32/INT64 | Time of day |
TIMESTAMP | INT64 | Timestamp with optional TZ |
INTERVAL | FIXED[12] | Time interval |
DECIMAL | INT32/64/FIXED | Decimal numbers |
LIST | - | List of values |
MAP | - | Key-value pairs |
ENUM | BYTE_ARRAY | Enumerated values |
JSON | BYTE_ARRAY | JSON encoded data |
BSON | BYTE_ARRAY | BSON encoded data |
- Deprecated
- INT_8, INT_16, INT_32, INT_64
- UINT_8, UINT_16, UINT_32, UINT_64
- TIME_MILLIS
- TIME_MICROS