Data Type

Blendata supports data types (based on Spark 3.5.3) as the following:
Numeric type
- ByteType: Represents 1-byte signed integer numbers. The range of numbers is from -128 to 127.
- ShortType: Represents 2-byte signed integer numbers. The range of numbers is from -32768 to 32767.
- IntegerType: Represents 4-byte signed integer numbers. The range of numbers is from -2147483648 to 2147483647.
- LongType: Represents 8-byte signed integer numbers. The range of numbers is from -9223372036854775808 to 9223372036854775807.
- FloatType: Represents 4-byte single-precision floating point numbers.
- DoubleType: Represents 8-byte double-precision floating point numbers.
- DecimalType: Represents arbitrary-precision signed decimal numbers. Backed internally by java.math.BigDecimal. A BigDecimal consists of an arbitrary precision integer unscaled value and a 32-bit integer scale.
String type
- StringType: Represents character string values.
Boolean type
- BooleanType: Represents boolean values.
Datetime type
- DateType: Represents values comprising values of fields year, month and day, without a time-zone.
- TimestampType: Timestamp with local time zone(TIMESTAMP_LTZ). It represents values comprising values of fields year, month, day, hour, minute, and second, with the session local time-zone. The timestamp value represents an absolute point in time.
Complex types
- ArrayType(elementType, containsNull): Represents values comprising a sequence of elements with the type of elementType. containsNull is used to indicate if elements in a ArrayType value can have null values.
- StructType(fields): Represents values with the structure described by a sequence of StructFields (fields).
SQL Syntax

This SQL syntax guide is based on Spark 3.5.3 and support the SQL command in some menus such as SQL Editor.
DDL Statements
Data Definition Statements are used to create or modify the structure of database objects in a database. Spark SQL supports the following Data Definition Statements:
CREATE TABLE
Description
The CREATE TABLE statement defines a new table using a Data Source. Currently, this syntax only supports internal, delta, and view tables. State and time-series tables are not supported.
Syntax
CREATE TABLE table_identifier
[ ( col_name1 col_type1, ... ) ]
USING data_source
[ OPTIONS ( key1=val1, key2=val2, ... ) ]
[ PARTITIONED BY ( col_name1, col_name2, ... ) ]
[ LOCATION path ]
[ COMMENT table_comment ]
[ TBLPROPERTIES ( key1=val1, key2=val2, ... ) ]
Note that, the clauses between the USING clause and the AS SELECT clause can come in as any order. For example, you can write COMMENT table_comment after TBLPROPERTIES.
Parameters
- table_identifier: Specifies a table name, which may be optionally qualified with a database name.
Syntax: [ database_name. ] table_name - USING data_source: Data Source is the input format used to create the table. Data source can be CSV, TXT, ORC, JDBC, PARQUET, etc.
- OPTIONS: Options of data source which will be injected to storage properties.
- PARTITIONED BY: Partitions are created on the table, based on the columns specified.
- LOCATION: Path to the directory where table data is stored, which could be a path on distributed storage like HDFS, etc.
- COMMENT: A string literal to describe the table.
- TBLPROPERTIES: A list of key-value pairs that is used to tag the table definition.
Limitation: For Delta tables, this syntax does not support Primary Key and Foreign Key constraints. These keys are only applicable to Databricks Runtime 15.2 and Databricks SQL 2024.30, as the open-source version of Spark does not support them.
CREATE VIEW
Views are based on the result-set of an SQL query. CREATE VIEW constructs a virtual table that has no physical data therefore other operations like ALTER VIEW and DROP VIEW only change metadata.
CREATE VIEW view_identifier create_view_clauses AS query
- View_identifier: Specifies a view name, which may be optionally qualified with a database name. Syntax: [ database_name. ] view_name
- Create_view_clauses: These clauses are optional and order insensitive. It can be of following formats.
- [ ( column_name [ COMMENT column_comment ], … ) ] to specify column-level comments.
- [ COMMENT view_comment ] to specify view-level comments.
- [ TBLPROPERTIES ( property_name = property_value [ , … ] ) ] to add metadata key-value pairs.
- query A SELECT statement that constructs the view from base tables or other views.
DROP TABLE
Description
DROP TABLE deletes the table and removes the directory associated with the table from the file system if the table is not EXTERNAL table. If the table is not present it throws an exception. Currently, this syntax only supports internal, delta, and view tables. State and time-series tables are not supported.
Syntax
DROP TABLE table_identifier
Parameter
- table_identifier: Specifies the table name to be dropped. The table name may be optionally qualified with a database name.
Syntax: [ database_name. ] table_name
DROP VIEW
Description
DROP VIEW removes the metadata associated with a specified view from the catalog.
Syntax
DROP VIEW view_identifier
Parameter
- view_identifier: Specifies the view name to be dropped. The view name may be optionally qualified with a database name.
Syntax: [ database_name. ] view_name
TRUNCATE TABLE
The TRUNCATE TABLE statement removes all the rows from a table or partition(s). Currently, the SQL executor does not support this SQL command. However, Blendata Enterprise offers a truncate function within the Data Catalog UI. Users who possess delete and manage permissions can truncate data assets through the Data Catalog page.
USE DATABASE
USE statement is used to set the current schema. After the current schema is set, the unqualified database artifacts such as tables, functions and views that are referenced by SQLs are resolved from the current schema. The default schema name is ‘default’. Now, users can use this command through JDBC connection for data ingestion only.
DML Statements
Data Manipulation Statements are used to add, change, or delete data for delta table asset only. Users with edit and manage permission of data asset can use these commands. Spark SQL supports the following Data Manipulation Statements:
INSERT TABLE (for Delta table)
Description
The INSERT statement inserts new rows into a delta table or overwrites the existing data in the delta table. The inserted rows can be specified by value expressions or result from a query.
Syntax
INSERT INTO [ TABLE ] [( column_list )] VALUES ( value1, value2, value3, ...),( value1, value2, value3, ...),( value1, value2, value3, ...);
Parameters
- table: Specifies a table name, which may be optionally qualified with a database name.
- column_list: An optional parameter that specifies a comma-separated list of columns belonging to the table_identifier table. Spark will reorder the columns of the input query to match the table schema according to the specified column list.
Note: The current behaviour has some limitations: All specified columns should exist in the table and not be duplicated from each other. It includes all columns except the static partition columns. The size of the column list should be exactly the size of the data from VALUES clause or query.
- values: An explicitly specified value can be inserted. A comma must be used to separate each value in the clause. More than one set of values can be specified to insert multiple rows. Null values cannot be inserted in this command.
INSERT OVERWRITE (for Delta table)
Description
The INSERT OVERWRITE statement overwrites the existing data with the new values Insert using a VALUES Clause.
Syntax
INSERT OVERWRITE [TABLE] VALUES ( value1, value2, ... )
Parameters
- values: Specifies the values to be inserted. An explicitly specified value can be inserted. A comma must be used to separate each value in the clause. More than one set of values can be specified to insert multiple rows. Null values cannot be insert in this command.
- table: Specifies a table name, which may be optionally qualified with a database name.
Data Retrieval Statements
Data Retrieval Statements is the SELECT Statement in SQL command.
Common Table Expression
Description
A common table expression (CTE) defines a temporary result set that a user can reference possibly multiple times within the scope of a SQL statement like WITH AS. A CTE is used mainly in a SELECT statement.
Syntax
WITH expression_name [ ( column_name [ , … ] ) ] [ AS ] ( query )
Parameters
- query: Spark supports a SELECT statement and conforms to the ANSI SQL standard. Queries are used to retrieve result sets from one or more tables.
- expression_name: Specifies a name for the common table expression.
GROUP BY Clause
Description
The GROUP BY clause is used to group the rows based on a set of specified grouping expressions and compute aggregations on the group of rows based on one or more specified aggregate functions. When a FILTER clause is attached to an aggregate function, only the matching rows are passed to that function.
Syntax
GROUP BY group_expression
Parameters
- group_expression: Specifies the criteria based on which the rows are grouped together. The grouping of rows is performed based on result values of the grouping expressions. A grouping expression may be a column name like GROUP BY a, a column position like GROUP BY 0, or an expression like GROUP BY a + b.
HAVING Clause
Description
The HAVING clause is used to filter the results produced by GROUP BY based on the specified condition. It is often used in conjunction with a GROUP BY clause.
Syntax
HAVING boolean_expression
Parameters
- boolean_expression: Specifies any expression that evaluates to a result type boolean. Two or more expressions may be combined together using the logical operators ( AND, OR ).
Note: The expressions specified in the HAVING clause can only refer to: 1) Aggregate functions 2) Constants 3) Expressions that appear in GROUP BY.
HINTS
Description
Hints give users a way to suggest how Spark SQL to use specific approaches to generate its execution plan.
Syntax
/*+ hint [ , … ] */
Partitioning Hints
Partitioning hints allow users to suggest a partitioning strategy that Spark should follow. COALESCE, REPARTITION, and REPARTITION_BY_RANGE hints are supported and are equivalent to coalesce, repartition, and repartitionByRange Dataset APIs, respectively. The REBALANCE can only be used as a hint .These hints give users a way to tune performance and control the number of output files in Spark SQL. When multiple partitioning hints are specified, multiple nodes are inserted into the logical plan, but the leftmost hint is picked by the optimizer.
Partitioning Hints Types
- REBALANCE: The REBALANCE hint can be used to rebalance the query result output partitions, so that every partition is of a reasonable size (not too small and not too big). It can take column names as parameters, and try its best to partition the query result by these columns. This is a best-effort: if there are skews, Spark will split the skewed partitions, to make these partitions not too big. This hint is useful when you need to write the result of this query to a table, to avoid too small/big files. This hint is ignored if AQE is not enabled.
- COALESCE: The COALESCE hint can be used to reduce the number of partitions to the specified number of partitions. It takes a partition number as a parameter.
- REPARTITION: The REPARTITION hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes a partition number, column names, or both as parameters.
- REPARTITION_BY_RANGE: The REPARTITION_BY_RANGE hint can be used to repartition to the specified number of partitions using the specified partitioning expressions. It takes column names and an optional partition number as parameters.
JOIN
Description
A SQL join is used to combine rows from two relations based on join criteria. The following section describes the overall join syntax and the sub-sections cover different types of joins along with examples.
Syntax
relation { [ join_type ] JOIN [ LATERAL ] relation [ join_criteria ] | NATURAL join_type JOIN [ LATERAL ] relation }
Parameters
- relation: Specifies the relation to be joined.
- join_type: Specifies the join type.
Syntax: [ INNER ] | CROSS | LEFT [ OUTER ] | [ LEFT ] SEMI | RIGHT [ OUTER ] | FULL [ OUTER ] | [ LEFT ] ANTI
- join_criteria: Specifies how the rows from one relation will be combined with the rows of another relation.
Syntax: ON boolean_expression | USING ( column_name [ , … ] )
- boolean_expression: Specifies an expression with a return type of boolean.
Join Types
- Inner Join: The inner join is the default join in Spark SQL. It selects rows that have matching values in both relations.
Syntax: relation [ INNER ] JOIN relation [ join_criteria ]
- Left Join: A left join returns all values from the left relation and the matched values from the right relation, or appends NULL if there is no match. It is also referred to as a left outer join.
Syntax: relation LEFT [ OUTER ] JOIN relation [ join_criteria ]
- Right Join: A right join returns all values from the right relation and the matched values from the left relation, or appends NULL if there is no match. It is also referred to as a right outer join.
Syntax: relation RIGHT [ OUTER ] JOIN relation [ join_criteria ]
- Full Join: A full join returns all values from both relations, appending NULL values on the side that does not have a match. It is also referred to as a full outer join.
Syntax: relation FULL [ OUTER ] JOIN relation [ join_criteria ]
- Cross Join: A cross join returns the Cartesian product of two relations.
Syntax: relation CROSS JOIN relation [ join_criteria ]
- Semi Join: A semi join returns values from the left side of the relation that has a match with the right. It is also referred to as a left semi join.
Syntax: relation [ LEFT ] SEMI JOIN relation [ join_criteria ]
- Anti Join: An anti join returns values from the left relation that has no match with the right. It is also referred to as a left anti join.
Syntax: relation [ LEFT ] ANTI JOIN relation [ join_criteria ]
LIKE Predicate
Description
A LIKE predicate is used to search for a specific pattern. This predicate also supports multiple patterns with quantifiers include ANY, SOME and ALL.
Syntax
[ NOT ] { LIKE search_pattern [ ESCAPE esc_char ] | [ RLIKE | REGEXP ] regex_pattern }
[ NOT ] { LIKE quantifiers ( search_pattern [ , … ]) }
Parameters
- Search_pattern: Specifies a string pattern to be searched by the LIKE clause. It can contain special pattern-matching characters:
- % matches zero or more characters.
- _ matches exactly one character.
- esc_char: Specifies the escape character. The default escape character is .
- regex_pattern: Specifies a regular expression search pattern to be searched by the RLIKE or REGEXP clause.
- quantifiers: Specifies the predicate quantifiers include ANY, SOME and ALL. ANY or SOME means if one of the patterns matches the input, then return true; ALL means if all the patterns matches the input, then return true
LIMIT Clause
Description
The LIMIT clause is used to constrain the number of rows returned by the SELECT statement. In general, this clause is used in conjunction with ORDER BY to ensure that the results are deterministic.
Syntax
LIMIT { ALL | integer_expression }
Parameters
- integer_expression: Specifies a foldable expression that returns an integer.
- ALL: If specified, the query returns all the rows. In other words, no limit is applied if this option is specified.
OFFSET Clause
Description
The OFFSET clause is used to specify the number of rows to skip before beginning to return rows returned by the SELECT statement. In general, this clause is used in conjunction with ORDER BY to ensure that the results are deterministic.
Syntax
OFFSET integer_expression
Parameters
- integer_expression: Specifies a foldable expression that returns an integer.
ORDER BY Claus
Description
The ORDER BY clause is used to return the result rows in a sorted manner in the user specified order. Unlike the SORT BY clause, this clause guarantees a total order in the output.
Syntax
ORDER BY { expression [ sort_direction ] }
Parameters
- ORDER BY: Specifies a comma-separated list of expressions along with optional parameters sort_direction, which are used to sort the rows.
- sort_direction: Optionally specifies whether to sort the rows in ascending or descending order. The valid values for the sort direction are ASC for ascending and DESC for descending. If sort direction is not explicitly specified, then by default rows are sorted ascending.
Syntax: [ ASC | DESC ]
Set Operators
Description
Set operators are used to combine two input relations into a single one. Spark SQL supports three types of set operators:
- EXCEPT or MINUS
- INTERSECT
- UNION
Note that input relations must have the same number of columns and compatible data types for the respective columns.
EXCEPT
EXCEPT return the rows that are found in one relation but not the other. EXCEPT (alternatively, EXCEPT DISTINCT) takes only distinct rows. Note that MINUS is an alias for EXCEPT.
Syntax: [ ( ] relation [ ) ] EXCEPT | MINUS [ DISTINCT ] [ ( ] relation [ ) ]
INTERSECT
INTERSECT returns the rows that are found in both relations.
Syntax: [ ( ] relation [ ) ] INTERSECT [ ( ] relation [ ) ]
UNION
UNION and UNION ALL return the rows that are found in either relation. UNION (alternatively, UNION DISTINCT) takes only distinct rows while UNION ALL does not remove duplicates from the result rows.
Syntax: [ ( ] relation [ ) ] UNION [ ALL | DISTINCT ] [ ( ] relation [ ) ]
SORT BY Clause
Description
The SORT BY clause is used to return the result rows sorted within each partition in the user specified order. When there is more than one partition SORT BY may return result that is partially ordered. This is different than ORDER BY clause which guarantees a total order of the output.
Syntax
SORT BY { expression [ sort_direction | nulls_sort_order ] [ , … ] }
Parameters
- SORT BY: Specifies a comma-separated list of expressions along with optional parameters sort_direction and nulls_sort_order which are used to sort the rows within each partition.
- sort_direction: Optionally specifies whether to sort the rows in ascending or descending order. The valid values for the sort direction are ASC for ascending and DESC for descending. If sort direction is not explicitly specified, then by default rows are sorted ascending.
Syntax: [ ASC | DESC ]
Table-valued Functions
A table-valued function (TVF) is a function that returns a relation or a set of rows. There are two types of TVFs in Spark SQL:
- a TVF that can be specified in a FROM clause, e.g. range;
- a TVF that can be specified in SELECT/LATERAL VIEW clauses, e.g. explode.
Supported Table-valued Functions
TVFs that can be specified in a FROM clause:
| Function | Argument Type(s) | Description |
|---|---|---|
| range ( end ) | Long | Creates a table with a single LongType column named id, containing rows in a range from 0 to end (exclusive) with step value 1. |
| range ( start, end ) | Long, Long | Creates a table with a single LongType column named id, containing rows in a range from start to end (exclusive) with step value 1. |
| range ( start, end, step ) | Long, Long, Long | Creates a table with a single LongType column named id, containing rows in a range from start to end (exclusive) with step value. |
| range ( start, end, step, numPartitions ) | Long, Long, Long, Int | Creates a table with a single LongType column named id, containing rows in a range from start to end (exclusive) with step value, with partition number numPartitions specified. |
TVFs that can be specified in SELECT/LATERAL VIEW clauses:
| Function | Argument Type(s) | Description |
|---|---|---|
| explode ( expr ) | Array/Map | Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map. |
| explode_outer ( expr ) | Array/Map | Separates the elements of array expr into multiple rows, or the elements of map expr into multiple rows and columns. Unless specified otherwise, uses the default column name col for elements of the array or key and value for the elements of the map. |
| inline ( expr ) | Expression | Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise. |
| inline_outer ( expr ) | Expression | Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise. |
| posexplode ( expr ) | Array/Map | Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map. |
| posexplode_outer ( expr ) | Array/Map | Separates the elements of array expr into multiple rows with positions, or the elements of map expr into multiple rows and columns with positions. Unless specified otherwise, uses the column name pos for position, col for elements of the array or key and value for elements of the map. |
| stack ( n, expr1, …, exprk ) | Seq[Expression] | Separates expr1, …, exprk into n rows. Uses column names col0, col1, etc. by default unless specified otherwise. |
| json_tuple ( jsonStr, p1, p2, …, pn ) | Seq[Expression] | Returns a tuple like the function get_json_object, but it takes multiple names. All the input parameters and output column types are string. |
| parse_url ( url, partToExtract[, key] ) | Seq[Expression] | Extracts a part from a URL. |
WHERE Clause
Description
The WHERE clause is used to limit the results of the FROM clause of a query or a subquery based on the specified condition.
Syntax
WHERE boolean_expression
Parameters
- Boolean_expression: Specifies any expression that evaluates to a result type boolean. Two or more expressions may be combined together using the logical operators ( AND, OR ).
Aggregate Function
Aggregate functions operate on values across rows to perform mathematical calculations such as sum, average, counting, minimum/maximum values, standard deviation, and estimation, as well as some non-mathematical operations.
Syntax
aggregate_function(input1 [, input2, …]) FILTER (WHERE boolean_expression)
- aggregate_function: Please refer to the supported aggregation functions list below.
- boolean_expression: Specifies any expression that evaluates to a result type boolean. Two or more expressions may be combined together using the logical operators ( AND, OR ).
Aggregation Functions List
- any(boolean_field)
- any_value(string_field)
- bool_and(boolean_field)
- bool_or(boolean_field)
- count_if(boolean_field)
- count(*)
- count(integer_field)
- count(DISTINCT integer_field)
- sum(integer_field)
- sum(DISTINCT integer_field)
- avg(integer_field)
- mean(float_field)
- min(integer_field)
- max(integer_field)
- stddev(float_field)
- stddev_samp(float_field)
- stddev_pop(float_field)
- variance(float_field)
- var_samp(float_field)
- var_pop(float_field)
- skewness(float_field)
- bit_and(integer_field)
- bit_or(integer_field)
- bit_xor(integer_field)
- corr(integer_field, float_field)
- covar_samp(integer_field, float_field)
- covar_pop(integer_field, float_field)
- first(string_field)
- first_value(string_field) OVER (…)
- last(string_field)
- last_value(string_field) OVER (…)
- collect_list(string_field)
- collect_set(string_field)
- array_agg(string_field)
- approx_count_distinct(string_field)
- approx_percentile(float_field, 0.5)
- mode(string_field)
- kurtosis(double_field)
- percentile(double_field, 0.5)
- percentile(double_field, array(0.25, 0.5, 0.75))
- count_min_sketch(boolean_field, …)
- histogram_numeric(double_field, 4)
- hll_sketch_agg(string_field, 12)
- hll_union_agg(boolean_field)
- every(boolean_field)
- max_by(string_field, double_field)
- min_by(string_field, double_field)
- median(double_field)
- try_avg(double_field)
- try_sum(double_field)
- regr_avgx(double_field, double_field)
- regr_avgy(double_field, double_field)
- regr_count(double_field, double_field)
- regr_intercept(double_field, double_field)
- regr_r2(double_field, double_field)
- regr_slope(double_field, double_field)
- regr_sxx(double_field, double_field)
- regr_sxy(double_field, double_field)
- regr_syy(double_field, double_field)
- bitmap_construct_agg(bitmap_bit_position(integer_field))
- grouping(integer_field)
- grouping_id(group_id, integer_field, float_field)
- percentile_cont(0.25) WITHIN GROUP
- percentile_disc(0.5) WITHIN GROUP
- some(boolean_field)
- std(integer_field)
CASE Clause
Description
CASE clause uses a rule to return a specific result based on the specified condition, similar to if/else statements in other programming languages.
Syntax
CASE [ expression ] { WHEN boolean_expression THEN then_expression } [ ... ] [ ELSE else_expression ]END
Parameters
- boolean_expression: Specifies any expression that evaluates to a result type boolean. Two or more expressions may be combined together using the logical operators ( AND, OR ).
- then_expression: Specifies the then expression based on the boolean_expression condition; then_expression and else_expression should all be the same type or coercible to a common type.
- else_expression: Specifies the default expression; then_expression and else_expression should all be the same type or coercible to a common type.
Auxiliary Statements
Most of commands are available for users, who are enabled the Metadata setting in the User Management menu first.
DESCRIBE DATABASE
Description
DESCRIBE DATABASE statement returns the metadata of an existing database. The metadata information includes database name, database comment, and database location on the filesystem. If the optional EXTENDED option is specified, it returns the basic metadata information along with the database properties. The DATABASE and SCHEMA are interchangeable.
Syntax
{ DESC | DESCRIBE } DATABASE [ EXTENDED ] db_name
Parameters
- db_name: Specifies a name of an existing database or an existing schema in the system. If the name does not exist, an exception is thrown.
DESCRIBE FUNCTION
Description
DESCRIBE FUNCTION statement returns the basic metadata information of an existing function. The metadata information includes the function name, implementing class and the usage details. If the optional EXTENDED option is specified, the basic metadata information is returned along with the extended usage information.
Syntax
{ DESC | DESCRIBE } FUNCTION [ EXTENDED ] function_name
Parameters
- function_name: Specifies a name of an existing function in the system. The function name may be optionally qualified with a database name. If function_name is qualified with a database then the function is resolved from the user specified database, otherwise it is resolved from the current database.
Syntax: [ database_name. ] function_name
DESCRIBE QUERY
Description
The DESCRIBE QUERY statement is used to return the metadata of output of a query. A shorthand DESC may be used instead of DESCRIBE to describe the query output.
Syntax
{ DESC | DESCRIBE } [ QUERY ] input_statement
Parameters
- QUERY This clause is optional and may be omitted.
- input_statement: Specifies a result set producing statement and may be one of the following:
- a SELECT statement
- a CTE(Common table expression) statement
- an INLINE TABLE statement
- a TABLE statement
- a FROM statement
Please refer to select-statement for a detailed syntax of the query parameter.
DESCRIBE TABLE
Description
DESCRIBE TABLE statement returns the basic metadata information of a table. The metadata information includes column name, column type and column comment. Optionally a partition spec or column name may be specified to return the metadata pertaining to a partition or column respectively.
Syntax
{ DESC | DESCRIBE } [ TABLE ] [ format ] table_identifier [ partition_spec ] [ col_name ]
Parameters
- format: Specifies the optional format of describe output. If EXTENDED is specified then additional metadata information (such as parent database, owner, and access time) is returned.
- table: Specifies a table name, which may be optionally qualified with a database name.
Syntax: [ database_name. ] table_name
- partition_spec: An optional parameter that specifies a comma separated list of key and value pairs for partitions. When specified, additional partition metadata is returned.
Syntax: PARTITION ( partition_col_name = partition_col_val [ , … ] )
- col_name: An optional parameter that specifies the column name that needs to be described. The supplied column name may be optionally qualified. Parameters partition_spec and col_name are mutually exclusive and can not be specified together. Currently nested columns are not allowed to be specified.
Syntax: [ database_name. ] [ table_name. ] column_name
REFRESH TABLE
REFRESH TABLE statement invalidates the cached entries, which include data and metadata of the given table or view for partition. The invalidated cache is populated in lazy manner when the cached table or the query associated with it is executed again. Now, users cannot use this command in SQL execution but, we have provided this option on Data Catalog UI. This action needs at least View permission level.
SHOW COLUMNS
Description
Returns the list of columns in a table. If the table does not exist, an exception is thrown.
Syntax
SHOW COLUMNS table [ database ]
Parameters
- table: Specifies the table name of an existing table. The table may be optionally qualified with a database name.
Syntax: { IN | FROM } [ database_name . ] table_name
Note: Keywords IN and FROM are interchangeable.
- database: Specifies an optional database name. The table is resolved from this database when it is specified. When this parameter is specified then table name should not be qualified with a different database name.
Syntax: { IN | FROM } database_name
Note: Keywords IN and FROM are interchangeable.
SHOW CREATE TABLE
Description
SHOW CREATE TABLE returns the CREATE TABLE statement or CREATE VIEW statement that was used to create a given table or view. SHOW CREATE TABLE on a non-existent table or a temporary view throws an exception.
Syntax
SHOW CREATE TABLE table
Parameters
- table: Specifies a table or view name, which may be optionally qualified with a database name.
Syntax: [ database_name. ] table_name
SHOW DATABASES
Description
Lists the databases that match an optionally supplied regular expression pattern. If no pattern is supplied then the command lists all the databases in the system. Please note that the usage of SCHEMAS and DATABASES are interchangeable and mean the same thing.
Syntax
SHOW { DATABASES | SCHEMAS } [ LIKE regex_pattern ]
Parameters
- regex_pattern: Specifies a regular expression pattern that is used to filter the results of the statement.
- Except for * and | character, the pattern works like a regular expression.
- * alone matches 0 or more characters and | is used to separate multiple different regular expressions, any of which can match.
- The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive.
SHOW FUNCTIONS
Description
Returns the list of functions after applying an optional regex pattern. Given number of functions supported by Spark is quite large, this statement in conjunction with describe function may be used to quickly find the function and understand its usage. The LIKE clause is optional and supported only for compatibility with other systems.
Syntax
SHOW [ function_kind ] FUNCTIONS [ { FROM | IN } database_name ] [ LIKE regex_pattern ]
Parameters
- function_kind: Specifies the name space of the function to be searched upon. The valid name spaces are :
- USER – Looks up the function(s) among the user defined functions.
- SYSTEM – Looks up the function(s) among the system defined functions.
- ALL – Looks up the function(s) among both user and system defined functions.
- { FROM | IN } database_name: Specifies the database name from which functions are listed.
- regex_pattern: Specifies a regular expression pattern that is used to filter the results of the statement.
- Except for * and | character, the pattern works like a regular expression.
- * alone matches 0 or more characters and | is used to separate multiple different regular expressions, any of which can match.
- The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive.
SHOW PARTITIONS
Description
The SHOW PARTITIONS statement is used to list partitions of a table. An optional partition spec may be specified to return the partitions matching the supplied partition spec. This statement requires “View” permission level of selected table.
Syntax
SHOW PARTITIONS table [ partition_spec ]
Parameters
- table: Specifies a table name, which may be optionally qualified with a database name.
Syntax: [ database_name. ] table_name
- partition_spec: An optional parameter that specifies a comma separated list of key and value pairs for partitions. When specified, the partitions that match the partition specification are returned.
Syntax: PARTITION ( partition_col_name = partition_col_val [ , … ] )
SHOW TABLE EXTENDED
Description
SHOW TABLE EXTENDED will show information for all tables matching the given regular expression. Output includes basic table information and file system information like Last Access, Created By, Type, Provider, Table Properties, Location, Serde Library, InputFormat, OutputFormat, Storage Properties, Partition Provider, Partition Columns and Schema.
If a partition specification is present, it outputs the given partition’s file-system-specific information such as Partition Parameters and Partition Statistics. Note that a table regex cannot be used with a partition specification.
Syntax
SHOW TABLE EXTENDED [ { IN | FROM } database_name ] LIKE regex_pattern [ partition_spec ]
Parameters
- { IN|FROM } database_name: Specifies database name. If not provided, will use the current database.
- regex_pattern: Specifies the regular expression pattern that is used to filter out unwanted tables.
- Except for * and | character, the pattern works like a regular expression.
- * alone matches 0 or more characters and | is used to separate multiple different regular expressions, any of which can match.
- The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive.
- partition_spec: An optional parameter that specifies a comma separated list of key and value pairs for partitions. Note that a table regex cannot be used with a partition specification.
Syntax: PARTITION ( partition_col_name = partition_col_val [ , … ] )
SHOW TABLES
Description
The SHOW TABLES statement returns all the tables for an optionally specified database. Additionally, the output of this statement may be filtered by an optional matching pattern. If no database is specified then the tables are returned from the current database.
Syntax
SHOW TABLES [ { FROM | IN } database_name ] [ LIKE regex_pattern ]
Parameters
- { FROM | IN } database_name: Specifies the database name from which tables are listed.
- regex_pattern: Specifies the regular expression pattern that is used to filter out unwanted tables.
- Except for * and | character, the pattern works like a regular expression.
- * alone matches 0 or more characters and | is used to separate multiple different regular expressions, any of which can match.
- The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive.
SHOW TBLPROPERTIES
Description
This statement returns the value of a table property given an optional value for a property key. If no key is specified then all the properties are returned.
Syntax
SHOW TBLPROPERTIES table [ ( unquoted_property_key | property_key_as_string_literal ) ]
Parameters
- table: Specifies the table name of an existing table. The table may be optionally qualified with a database name.
Syntax: [ database_name. ] table_name
- unquoted_property_key: Specifies the property key in unquoted form. The key may consists of multiple parts separated by dot.
Syntax: [ key_part1 ] [ .key_part2 ] [ … ]
- property_key_as_string_literal: Specifies a property key value as a string literal.
Note: Property value returned by this statement excludes some properties that are internal to spark and hive. The excluded properties are : 1)All the properties that start with prefix spark.sql 2)Property keys such as: EXTERNAL, comment 3)All the properties generated internally by hive to store statistics. Some of these properties are: numFiles, numPartitions, numRows.
SHOW VIEWS
Description
The SHOW VIEWS statement returns all the views for an optionally specified database. Additionally, the output of this statement may be filtered by an optional matching pattern. If no database is specified then the views are returned from the current database. If the specified database is global temporary view database, we will list global temporary views. Note that the command also lists local temporary views regardless of a given database.
Syntax
SHOW VIEWS [ { FROM | IN } database_name ] [ LIKE regex_pattern ]
Parameters
- { FROM | IN } database_name: Specifies the database name from which views are listed.
- regex_pattern: Specifies the regular expression pattern that is used to filter out unwanted views.
- Except for * and | character, the pattern works like a regular expression.
- * alone matches 0 or more characters and | is used to separate multiple different regular expressions, any of which can match.
- The leading and trailing blanks are trimmed in the input pattern before processing. The pattern match is case-insensitive.