Export and load statements in GoogleSQL

`EXPORT DATA` statement

The EXPORT DATA statement exports the results of a query to an external storage location. You can export to the following services:

Cloud Storage
Amazon Simple Storage Service (Amazon S3)
Spanner
Bigtable
Pub/Sub

Syntax

EXPORT DATA
[WITH CONNECTION connection_name]
OPTIONS (export_option_list) AS
query_statement

Arguments

connection_name: Specifies a connection that has credentials for accessing the Amazon S3 data. Specify the connection name in the form PROJECT_ID.LOCATION.CONNECTION_ID. If the project ID or location contains a dash, enclose the connection name in backticks (`). Connections aren't required to export to Google Cloud services.
export_option_list: Specifies a list of options for the export operation, including the URI of the destination. For more information, see the following sections:
query_statement: A SQL query. The query result is exported to the external destination. The query can't reference metatables, including INFORMATION_SCHEMA views, system tables, or wildcard tables.

Export to Cloud Storage or Amazon S3

You can export BigQuery data to Cloud Storage or Amazon S3 in Avro, CSV, JSON, and Parquet formats. For more information about exporting to Cloud Storage, see Export table data to Cloud Storage.

Use the format option to specify the format of the exported data. The following limitations apply:

You cannot export nested and repeated data in CSV format.
If you export data in JSON format, INT64 data types are encoded as JSON strings to preserve 64-bit precision.

You are not billed for the export operation, but you are billed for running the query and for storing data in Cloud Storage or or Amazon S3. For more information, see Cloud Storage pricing or Amazon S3 pricing.

Cloud Storage and Amazon S3 export option list

The option list specifies options for exporting to Cloud Storage or Amazon S3. Specify the option list in the following format: NAME=VALUE, ...

Options
`compression`	`STRING` Specifies a compression format. If not specified, the exported files are uncompressed. Supported values include: `GZIP`, `DEFLATE`, `SNAPPY`.
`field_delimiter`	`STRING` The delimiter used to separate fields. Default: `','` (comma). Applies to: CSV.
`format`	`STRING` Required. The format of the exported data. Supported values include: `AVRO`, `CSV`, `JSON`, `PARQUET`.
`header`	`BOOL` If `true`, generates column headers for the first row of each data file. Default: `false`. Applies to: CSV.
`overwrite`	`BOOL` If `true`, overwrites any existing files with the same URI. Otherwise, if the destination storage bucket is not empty, the statement returns an error. Default: `false`. Note: When `overwrite` is `true`, files are only overwritten, no files are ever deleted, even if they match the wildcard specified in the URI.
`uri`	`STRING` Required. The destination URI for the export. The `uri` option must be a single-wildcard URI as described in Exporting data into one or more files. Examples: `"gs://bucket/path/file_.csv"` or `"s3://bucket/path/file_.csv"`
`use_avro_logical_types`	`BOOL` Whether to use appropriate AVRO logical types when exporting `TIMESTAMP`, `DATETIME`, `TIME` and `DATE` types. Applies to: AVRO. For more information, see Avro export details.

Examples

The following examples show common use cases for exporting to Cloud Storage or Amazon S3.

Export data to Cloud Storage in CSV format

The following example exports data to a CSV file. It includes options to overwrite the destination location, write header rows, and use ';' as a delimiter.

EXPORT DATA OPTIONS(
  uri='gs://bucket/folder/*.csv',
  format='CSV',
  overwrite=true,
  header=true,
  field_delimiter=';') AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10

Export data to Cloud Storage in Avro format

The following example exports data to Avro format using Snappy compression.

EXPORT DATA OPTIONS(
  uri='gs://bucket/folder/*',
  format='AVRO',
  compression='SNAPPY') AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10

Export data to Cloud Storage in Parquet format

The following example exports data to Parquet format. It includes the option to overwrite the destination location.

EXPORT DATA OPTIONS(
  uri='gs://bucket/folder/*',
  format='PARQUET',
  overwrite=true) AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10

Export data to Amazon S3 in JSON format

The following example exports query results that run against a BigLake table based on Amazon S3 to your Amazon S3 bucket:

EXPORT DATA
  WITH CONNECTION myproject.us.myconnection
  OPTIONS(
  uri='s3://bucket/folder/*',
  format='JSON',
  overwrite=true) AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10

Export to Bigtable

You can export BigQuery data to a Bigtable table by using the EXPORT DATA statement. For Bigtable export examples and configuration options, see Export data to Bigtable.

You are not billed for the export operation, but you are billed for running the query and for storing data in Bigtable. For more information, see Bigtable pricing.

Bigtable export option list

The option list specifies options for exporting to Bigtable. Specify the option list in the following format: NAME=VALUE, ...

Options
`format`	`STRING` Required. When exporting to Bigtable, the value must always be `CLOUD_BIGTABLE`.
`bigtable_options`	`STRING` JSON string containing configurations related to mapping exported fields to Bigtable columns families and columns. For more information, see Configure exports with `bigtable_options`.
`overwrite`	`BOOL` If `true`, allows export to overwrite existing data in the destination Bigtable table. When set to `false`, and if the destination table is not empty, the statement returns an error. Default: `false`.
`truncate`	`BOOL` If `true`, all existing data in the destination table will be deleted before any new data is written. Otherwise the export will proceed with a non-empty destination table. Default: `false`. Warning: Only use `truncate` when the destination table doesn't contain data that needs to be retained.
`uri`	`STRING` Required. The destination URI for the export. We recommend specifying an app profile for traffic routing and visibility at monitoring dashboards provided by Bigtable. The `uri` option for a Bigtable export must be provided in the following format: `https://bigtable.googleapis.com/projects/PROJECT_ID/instances/INSTANCE_ID/appProfiles/APP_PROFILE/tables/TABLE_NAME`
`auto_create_column_families`	`BOOL` If `true`, allows export to create missing column families in the target table. If `false` and if the destination table is missing a column family, the statement returns an error. Default: `false`.

Example

The following example exports data to a Bigtable table. Data in field1 becomes a row key in Bigtable destination table. The fields field2, field3 and field4 are written as columns cbtFeld2, cbtField3 and cbtField4 into column family column_family.

EXPORT DATA OPTIONS (
uri="https://bigtable.googleapis.com/projects/my-project/instances/my-instance/tables/my-table",
format="CLOUD_BIGTABLE",
bigtable_options="""{
   "columnFamilies" : [
      {
        "familyId": "column_family",
        "columns": [
           {"qualifierString": "cbtField2", "fieldName": "field2"},
           {"qualifierString": "cbtField3", "fieldName": "field3"},
           {"qualifierString": "cbtField4", "fieldName": "field4"},
        ]
      }
   ]
}"""
) AS
SELECT
CAST(field1 as STRING) as rowkey,
STRUCT(field2, field3, field4) as column_family
FROM `bigquery_table`

Export to Pub/Sub

You can export BigQuery data to a Pub/Sub topic by using the EXPORT DATA statement in a continuous query. For more information about Pub/Sub configuration options, see Export data to Pub/Sub.

For information about the costs involved with exporting to Pub/Sub by using a continuous query, see Costs.

Pub/Sub export option list

The option list specifies options for exporting to Pub/Sub. Specify the option list in the following format: NAME=VALUE, ...

Options

Options
`format`	`STRING` Required. When exporting to Pub/Sub, the value must always be `CLOUD_PUBSUB`.
`uri`	`STRING` Required. The destination URI for the export. The `uri` option for a Pub/Sub export must be provided in the following format: `https://pubsub.googleapis.com/projects/PROJECT_ID/topics/TOPIC_ID`

format

STRING

Required. When exporting to Pub/Sub, the value must always be CLOUD_PUBSUB.

uri

STRING

Required. The destination URI for the export. The uri option for a Pub/Sub export must be provided in the following format: https://pubsub.googleapis.com/projects/PROJECT_ID/topics/TOPIC_ID

Example

The following example shows a continuous query that filters data from a BigQuery table that is receiving streaming taxi ride information, and publishes the data to a Pub/Sub topic in real time:

EXPORT DATA
  OPTIONS (
    format = 'CLOUD_PUBSUB',
    uri = 'https://pubsub.googleapis.com/projects/myproject/topics/taxi-real-time-rides')
AS (
  SELECT
    TO_JSON_STRING(
      STRUCT(
        ride_id,
        timestamp,
        latitude,
        longitude)) AS message
  FROM `myproject.real_time_taxi_streaming.taxi_rides`
  WHERE ride_status = 'enroute'
);

Export to Spanner

To provide feedback or request support for this feature, send email to bq-cloud-spanner-federation-preview@google.com.

You can export data from a BigQuery table to a Spanner table by using the EXPORT DATA statement.

Spanner export option list

The option list specifies options for the export operation. Specify the option list in the following format: NAME=VALUE, ...

Options

Options
`format`	`STRING` Required. To export data from BigQuery to Spanner, the value must always be `CLOUD_SPANNER`.
`uri`	`STRING` Required. The destination URI for the export. For Spanner, the URI must be provided in the following format: `https://spanner.googleapis.com/projects/PROJECT_ID/instances/INSTANCE_ID/databases/DATABASE_ID`
`spanner_options`	`STRING` Required. A JSON string containing configurations related to mapping exported fields to Spanner column families and columns. For more information, see Configure exports with `spanner_options` option.

format

STRING

Required. To export data from BigQuery to Spanner, the value must always be CLOUD_SPANNER.

uri

STRING

Required. The destination URI for the export. For Spanner, the URI must be provided in the following format: https://spanner.googleapis.com/projects/PROJECT_ID/instances/INSTANCE_ID/databases/DATABASE_ID

spanner_options

STRING

Required. A JSON string containing configurations related to mapping exported fields to Spanner column families and columns. For more information, see Configure exports with spanner_options option.

Examples

Export data to Spanner

The following example exports data to a Spanner table:

EXPORT DATA OPTIONS (
  uri="https://spanner.googleapis.com/projects/my-project/instances/my-instance/databases/my-database",
  format="CLOUD_SPANNER",
  spanner_options="""{ "table": "my_table" }"""
)
AS SELECT * FROM `bigquery_table`

For more Spanner export examples and configuration options, see Export data to Spanner.

`LOAD DATA` statement

Loads data from one or more files into a table. The statement can create a new table, append data into an existing table or partition, or overwrite an existing table or partition. If the LOAD DATA statement fails, the table into which you are loading data remains unchanged.

Syntax

LOAD DATA {OVERWRITE|INTO}  [{TEMP|TEMPORARY} TABLE]
[[project_name.]dataset_name.]table_name
[(
  column_list
)]
[[OVERWRITE] PARTITIONS (partition_column_name=partition_value)]
[PARTITION BY partition_expression]
[CLUSTER BY clustering_column_list]
[OPTIONS (table_option_list)]
FROM FILES(load_option_list)
[WITH PARTITION COLUMNS
  [(partition_column_list)]
]
[WITH CONNECTION connection_name]

column_list: column[, ...]

partition_column_list: partition_column_name, partition_column_type[, ...]

Arguments

INTO: If a table with this name already exists, the statement appends data to the table. You must use INTO instead of OVERWRITE if your statement includes the PARTITIONS clause.
OVERWRITE: If a table with this name already exists, the statement overwrites the table.
{TEMP|TEMPORARY} TABLE: Use this clause to create or write to a temporary table.
project_name: The name of the project for the table. The value defaults to the project that runs this DDL query.
dataset_name: The name of the dataset for the table.
table_name: The name of the table.
column_list: Contains the table's schema information as a list of table columns. For more information about table schemas, see Specifying a schema. If you don't specify a schema, BigQuery uses schema auto-detection to infer the schema.

When you load hive-partitioned data into a new table or overwrite an existing table, then that table schema contains the hive-partitioned columns and the columns in the column_list.

If you append hive-partitioned data to an existing table, then the hive-partitioned columns and column_list can be a subset of the existing columns. If the combined list of columns in not a subset of the existing columns, then the following rules apply:
- If your data is self-describing, such as ORC, PARQUET, or AVRO, then columns in the source file that are omitted from the column_list are ignored. Columns in the column_list that don't exist in the source file are written with NULL values. If a column is in the column_list and the source file, then their types must match.
- If your data is not self-describing, such as CSV or JSON, then columns in the source file that are omitted from the column_list are only ignored if you set ignore_unknown_values to TRUE. Otherwise this statement returns an error. You can't list columns in the column_list that don't exist in the source file.
[OVERWRITE] PARTITIONS: Use this clause to write to or overwrite exactly one partition. When you use this clause, the statement must begin with LOAD DATA INTO.
partition_column_name: The name of the partitioned column to write to. If you use both the PARTITIONS and the PARTITION BY clauses, then the column names must match.
partition_value: The partition_id of the partition to append or overwrite. To find the partition_id values of a table, query the INFORMATION_SCHEMA.PARTITIONS view. You can't set the partition_value to __NULL__ or __UNPARTITIONED__. You can only append to or overwrite one partition. If your data contains values that belong to multiple partitions, then the statement fails with an error. This partition_value must be literal value.
partition_expression: Specifies the table partitioning when creating a new table.
clustering_column_list: Specifies table clustering when creating a new table. The value is a comma-separated list of column names, with up to four columns.
table_option_list: Specifies options for creating the table. If you include this clause and the table already exists, then the options must match the existing table specification.
partition_column_list: A list of external partitioning columns.
connection_name: The connection name that is used to read the source files from an external data source.
load_option_list: Specifies options for loading the data.

If no table exists with the specified name, then the statement creates a new table. If a table already exists with the specified name, then the behavior depends on the INTO or OVERWRITE keyword. The INTO keyword appends the data to the table, and the OVERWRITE keyword overwrites the table.

If your external data uses a hive-partitioned layout, then include the WITH PARTITION COLUMNS clause. If you include the WITH PARTITION COLUMNS clause without partition_column_list, then BigQuery infers the partitioning from the data layout. If you include both column_list and WITH PARTITION COLUMNS, then partition_column_list is required.

You can't use the LOAD DATA statement to load data into a temporary table.

`column`

(column_name column_schema[, ...]) contains the table's schema information in a comma-separated list.

column :=
  column_name column_schema

column_schema :=
   {
     simple_type
     | STRUCT<field_list>
     | ARRAY<array_element_schema>
   }
   [PRIMARY KEY NOT ENFORCED | REFERENCES table_name(column_name) NOT ENFORCED]
   [DEFAULT default_expression]
   [NOT NULL]
   [OPTIONS(column_option_list)]

simple_type :=
  { data_type | STRING COLLATE collate_specification }

field_list :=
  field_name column_schema [, ...]

array_element_schema :=
  { simple_type | STRUCT<field_list> }
  [NOT NULL]

column_name is the name of the column. A column name:
- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
- Must start with a letter or underscore
- Can be up to 300 characters
column_schema: Similar to a data type, but supports an optional NOT NULL constraint for types other than ARRAY. column_schema also supports options on top-level columns and STRUCT fields.

column_schema can be used only in the column definition list of CREATE TABLE statements. It cannot be used as a type in expressions.
simple_type: Any supported data type aside from STRUCT and ARRAY.

If simple_type is a STRING, it supports an additional clause for collation, which defines how a resulting STRING can be compared and sorted. The syntax looks like this:
```
STRING COLLATE collate_specification
```
If you have DEFAULT COLLATE collate_specification assigned to the table, the collation specification for a column overrides the specification for the table.
default_expression: The default value assigned to the column.
field_list: Represents the fields in a struct.
field_name: The name of the struct field. Struct field names have the same restrictions as column names.
NOT NULL: When the NOT NULL constraint is present for a column or field, the column or field is created with REQUIRED mode. Conversely, when the NOT NULL constraint is absent, the column or field is created with NULLABLE mode.

Columns and fields of ARRAY type do not support the NOT NULL modifier. For example, a column_schema of ARRAY<INT64> NOT NULL is invalid, since ARRAY columns have REPEATED mode and can be empty but cannot be NULL. An array element in a table can never be NULL, regardless of whether the NOT NULL constraint is specified. For example, ARRAY<INT64> is equivalent to ARRAY<INT64 NOT NULL>.

The NOT NULL attribute of a table's column_schema does not propagate through queries over the table. If table T contains a column declared as x INT64 NOT NULL, for example, CREATE TABLE dataset.newtable AS SELECT x FROM T creates a table named dataset.newtable in which x is NULLABLE.

`column_option_list`

Specify a column option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

NAME VALUE Details

`NAME`	`VALUE`	Details
`description`	`STRING`	Example: `description="a unique id"` This property is equivalent to the schema.fields[].description table resource property.
`rounding_mode`	`STRING`	Example: `rounding_mode = "ROUND_HALF_EVEN"` This specifies the rounding mode that's used for values written to a `NUMERIC` or `BIGNUMERIC` type column or `STRUCT` field. The following values are supported: `"ROUND_HALF_AWAY_FROM_ZERO"`: Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3. `"ROUND_HALF_EVEN"`: Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2. This property is equivalent to the `roundingMode` table resource property.

description

STRING

Example: description="a unique id"

This property is equivalent to the schema.fields[].description table resource property.

rounding_mode

STRING

Example: rounding_mode = "ROUND_HALF_EVEN"

This specifies the rounding mode that's used for values written to a NUMERIC or BIGNUMERIC type column or STRUCT field. The following values are supported:

"ROUND_HALF_AWAY_FROM_ZERO": Halfway cases are rounded away from zero. For example, 2.25 is rounded to 2.3, and -2.25 is rounded to -2.3.
"ROUND_HALF_EVEN": Halfway cases are rounded towards the nearest even digit. For example, 2.25 is rounded to 2.2 and -2.25 is rounded to -2.2.

This property is equivalent to the roundingMode table resource property.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

A reference to a table
Subqueries or SQL statements such as SELECT, CREATE, or UPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
- ARRAY_TO_STRING
- REPLACE
- REGEXP_REPLACE
- RAND
- FORMAT
- LPAD
- RPAD
- REPEAT
- SESSION_USER
- GENERATE_ARRAY
- GENERATE_DATE_ARRAY

Setting the VALUE replaces the existing value of that option for the column, if there was one. Setting the VALUE to NULL clears the column's value for that option.

`partition_expression`

PARTITION BY is an optional clause that controls table partitioning. partition_expression is an expression that determines how to partition the table. The partition expression can contain the following values:

_PARTITIONDATE. Partition by ingestion time with daily partitions. This syntax cannot be used with the AS query_statement clause.
DATE(_PARTITIONTIME). Equivalent to _PARTITIONDATE. This syntax cannot be used with the AS query_statement clause.
<date_column>. Partition by a DATE column with daily partitions.
DATE({ <timestamp_column> | <datetime_column> }). Partition by a TIMESTAMP or DATETIME column with daily partitions.
DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR }). Partition by a DATETIME column with the specified partitioning type.
TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR }). Partition by a TIMESTAMP column with the specified partitioning type.
TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR }). Partition by ingestion time with the specified partitioning type. This syntax cannot be used with the AS query_statement clause.
DATE_TRUNC(<date_column>, { MONTH | YEAR }). Partition by a DATE column with the specified partitioning type.
RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>])). Partition by an integer column with the specified range, where:
- start is the start of range partitioning, inclusive.
- end is the end of range partitioning, exclusive.
- interval is the width of each range within the partition. Defaults to 1.

`table_option_list`

The option list allows you to set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.

Specify a table option list in the following format:

NAME=VALUE, ...

NAME and VALUE must be one of the following combinations:

`NAME`	`VALUE`	Details
`expiration_timestamp`	`TIMESTAMP`	Example: `expiration_timestamp=TIMESTAMP "2025-01-01 00:00:00 UTC"` This property is equivalent to the expirationTime table resource property.
`partition_expiration_days`	`FLOAT64`	Example: `partition_expiration_days=7` Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions do not expire. This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours. This property can only be set if the table is partitioned.
`require_partition_filter`	`BOOL`	Example: `require_partition_filter=true` Specifies whether queries on this table must include a a predicate filter that filters on the partitioning column. For more information, see Set partition filter requirements. The default value is `false`. This property is equivalent to the timePartitioning.requirePartitionFilter table resource property. This property can only be set if the table is partitioned.
`friendly_name`	`STRING`	Example: `friendly_name="my_table"` This property is equivalent to the friendlyName table resource property.
`description`	`STRING`	Example: `description="a table that expires in 2025"` This property is equivalent to the description table resource property.
`labels`	`ARRAY<STRUCT<STRING, STRING>>`	Example: `labels=[("org_unit", "development")]` This property is equivalent to the labels table resource property.
`default_rounding_mode`	`STRING`	Example: `default_rounding_mode = "ROUND_HALF_EVEN"` This specifies the default rounding mode that's used for values written to any new `NUMERIC` or `BIGNUMERIC` type columns or `STRUCT` fields in the table. It does not impact existing fields in the table. The following values are supported: `"ROUND_HALF_AWAY_FROM_ZERO"`: Halfway cases are rounded away from zero. For example, 2.5 is rounded to 3.0, and -2.5 is rounded to -3. `"ROUND_HALF_EVEN"`: Halfway cases are rounded towards the nearest even digit. For example, 2.5 is rounded to 2.0 and -2.5 is rounded to -2.0. This property is equivalent to the `defaultRoundingMode` table resource property.
`enable_change_history`	`BOOL`	In preview. Example: `enable_change_history=TRUE` Set this property to `TRUE` in order to capture change history on the table, which you can then view by using the `CHANGES` function. Enabling this table option has an impact on costs; for more information see Pricing and costs. The default is `FALSE`.
`max_staleness`	`INTERVAL`	Example: `max_staleness=INTERVAL "4:0:0" HOUR TO SECOND` The maximum interval behind the current time where it's acceptable to read stale data. For example, with change data capture, when this option is set, the table copy operation is denied if data is more stale than the `max_staleness` value. `max_staleness` is disabled by default.

VALUE is a constant expression containing only literals, query parameters, and scalar functions.

The constant expression cannot contain:

A reference to a table
Subqueries or SQL statements such as SELECT, CREATE, or UPDATE
User-defined functions, aggregate functions, or analytic functions
The following scalar functions:
- ARRAY_TO_STRING
- REPLACE
- REGEXP_REPLACE
- RAND
- FORMAT
- LPAD
- RPAD
- REPEAT
- SESSION_USER
- GENERATE_ARRAY
- GENERATE_DATE_ARRAY

`load_option_list`

Specifies options for loading data from external files. The format and uris options are required. Specify the option list in the following format: NAME=VALUE, ...

Options
`allow_jagged_rows`	`BOOL` If `true`, allow rows that are missing trailing optional columns. Applies to CSV data.
`allow_quoted_newlines`	`BOOL` If `true`, allow quoted data sections that contain newline characters in the file. Applies to CSV data.
`bigtable_options`	`STRING` Only required when creating a Bigtable external table. Specifies the schema of the Bigtable external table in JSON format. For a list of Bigtable table definition options, see `BigtableOptions` in the REST API reference.
`column_name_character_map`	`STRING` Defines the scope of supported column name characters and the handling behavior of unsupported characters. Supported values include: `STRICT`. Enables flexible column names. Loads jobs with unsupported characters in column names fail with an error message. This is the default value. `V1`. Column names can only contain standard column name characters. Unsupported characters are replaced with underscores. `V2`. Enables flexible column names. Unsupported characters are replaced with underscores. This is the recommended setting, except in use cases with external tables. Applies to CSV and Parquet data.
`compression`	`STRING` The compression type of the data source. Supported values include: `GZIP`. If not specified, the data source is uncompressed. Applies to CSV and JSON data.
`decimal_target_types`	`ARRAY<STRING>` Determines how to convert a `Decimal` type. Equivalent to ExternalDataConfiguration.decimal_target_types Example: `["NUMERIC", "BIGNUMERIC"]`.
`enable_list_inference`	`BOOL` If `true`, use schema inference specifically for Parquet LIST logical type. Applies to Parquet data.
`enable_logical_types`	`BOOL` If `true`, convert Avro logical types into their corresponding SQL types. For more information, see Logical types. Applies to Avro data.
`encoding`	`STRING` The character encoding of the data. Supported values include: `UTF8` (or `UTF-8`), `ISO_8859_1` (or `ISO-8859-1`). Applies to CSV data.
`enum_as_string`	`BOOL` If `true`, infer Parquet ENUM logical type as STRING instead of BYTES by default. Applies to Parquet data.
`field_delimiter`	`STRING` The separator for fields in a CSV file. Applies to CSV data.
`format`	`STRING` The format of the external data. Supported values for `CREATE EXTERNAL TABLE` include: `AVRO`, `CLOUD_BIGTABLE`, `CSV`, `DATASTORE_BACKUP`, `DELTA_LAKE` (preview), `GOOGLE_SHEETS`, `NEWLINE_DELIMITED_JSON` (or `JSON`), `ORC`, `PARQUET`. Supported values for `LOAD DATA` include: `AVRO`, `CSV`, `DELTA_LAKE` (preview) `NEWLINE_DELIMITED_JSON` (or `JSON`), `ORC`, `PARQUET`. The value `JSON` is equivalent to `NEWLINE_DELIMITED_JSON`.
`hive_partition_uri_prefix`	`STRING` A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables. Applies to Avro, CSV, JSON, Parquet, and ORC data. Example: `"gs://bucket/path"`.
`file_set_spec_type`	`STRING` Specifies how to interpret source URIs for load jobs and external tables. Supported values include: `FILE_SYSTEM_MATCH`. Expands source URIs by listing files from the object store. This is the default behavior if FileSetSpecType is not set. `NEW_LINE_DELIMITED_MANIFEST`. Indicates that the provided URIs are newline-delimited manifest files, with one URI per line. Wildcard URIs are not supported in the manifest files, and all referenced data files must be in the same bucket as the manifest file. For example, if you have a source URI of `"gs://bucket/path/file"` and the `file_set_spec_type` is `FILE_SYSTEM_MATCH`, then the file is used directly as a data file. If the `file_set_spec_type` is `NEW_LINE_DELIMITED_MANIFEST`, then each line in the file is interpreted as a URI that points to a data file.
`ignore_unknown_values`	`BOOL` If `true`, ignore extra values that are not represented in the table schema, without returning an error. Applies to CSV and JSON data.
`json_extension`	`STRING` For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records. Supported values include: `GEOJSON`. Newline-delimited GeoJSON data. For more information, see Creating an external table from a newline-delimited GeoJSON file.
`max_bad_records`	`INT64` The maximum number of bad records to ignore when reading the data. Applies to: CSV, JSON, and Google Sheets data.
`max_staleness`	`INTERVAL` Applicable for BigLake tables and object tables. Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it. To disable metadata caching, specify 0. This is the default. To enable metadata caching, specify an interval literal value between 30 minutes and 7 days. For example, specify `INTERVAL 4 HOUR` for a 4 hour staleness interval. With this value, operations against the table use cached metadata if it has been refreshed within the past 4 hours. If the cached metadata is older than that, the operation falls back to retrieving metadata from Cloud Storage instead.
`null_marker`	`STRING` The string that represents `NULL` values in a CSV file. Applies to CSV data.
`object_metadata`	`STRING` Only required when creating an object table. Set the value of this option to `SIMPLE` when creating an object table.
`preserve_ascii_control_characters`	`BOOL` If `true`, then the embedded ASCII control characters which are the first 32 characters in the ASCII table, ranging from '\x00' to '\x1F', are preserved. Applies to CSV data.
`quote`	`STRING` The string used to quote data sections in a CSV file. If your data contains quoted newline characters, also set the `allow_quoted_newlines` property to `true`. Applies to CSV data.
`skip_leading_rows`	`INT64` The number of rows at the top of a file to skip when reading the data. Applies to CSV and Google Sheets data.
`uris`	For external tables, including object tables, that aren't Bigtable tables: `ARRAY<STRING>` An array of fully qualified URIs for the external data locations. Each URI can contain one asterisk (``) wildcard character, which must come after the bucket name. When you specify `uris` values that target multiple files, all of those files must share a compatible schema. The following examples show valid `uris` values: `['gs://bucket/path1/myfile.csv']` `['gs://bucket/path1/.csv']` `['gs://bucket/path1/', 'gs://bucket/path2/file00']` For Bigtable tables: `STRING` The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI. Example: `https://googleapis.com/bigtable/projects/project_id/instances/instance_id[/appProfiles/app_profile]/tables/table_name` For more information on constructing a Bigtable URI, see Retrieving the Bigtable URI.

Examples

The following examples show common use cases for the LOAD DATA statement.

Load data into a table

The following example loads an Avro file into a table. Avro is a self-describing format, so BigQuery infers the schema.

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )

The following example loads two CSV files into a table, using schema autodetection.

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='CSV',
    uris = ['gs://bucket/path/file1.csv', 'gs://bucket/path/file2.csv']
  )

Load data using a schema

The following example loads a CSV file into a table, using a specified table schema.

LOAD DATA INTO mydataset.table1(x INT64, y STRING)
  FROM FILES(
    skip_leading_rows=1,
    format='CSV',
    uris = ['gs://bucket/path/file.csv']
  )

Set options when creating a new table

The following example creates a new table with a description and an expiration time.

LOAD DATA INTO mydataset.table1
  OPTIONS(
    description="my table",
    expiration_timestamp="2025-01-01 00:00:00 UTC"
  )
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )

Overwrite an existing table

The following example overwrites an existing table.

LOAD DATA OVERWRITE mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )

Load data into a temporary table

The following example loads an Avro file into a temporary table.

LOAD DATA INTO TEMP TABLE mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )

Specify table partitioning and clustering

The following example creates a table that is partitioned by the transaction_date field and clustered by the customer_id field. It also configures the partitions to expire after three days.

LOAD DATA INTO mydataset.table1
  PARTITION BY transaction_date
  CLUSTER BY customer_id
  OPTIONS(
    partition_expiration_days=3
  )
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/file.avro']
  )

Load data into a partition

The following example loads data into a selected partition of an ingestion-time partitioned table:

LOAD DATA INTO mydataset.table1
PARTITIONS(_PARTITIONTIME = TIMESTAMP '2016-01-01')
  PARTITION BY _PARTITIONTIME
  FROM FILES(
    format = 'AVRO',
    uris = ['gs://bucket/path/file.avro']
  )

Load a file that is externally partitioned

The following example loads a set of external files that use a hive partitioning layout.

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/*'],
    hive_partition_uri_prefix='gs://bucket/path'
  )
  WITH PARTITION COLUMNS(
    field_1 STRING, -- column order must match the external path
    field_2 INT64
  )

The following example infers the partitioning layout:

LOAD DATA INTO mydataset.table1
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/*'],
    hive_partition_uri_prefix='gs://bucket/path'
  )
  WITH PARTITION COLUMNS

If you include both column_list and WITH PARTITION COLUMNS, then you must explicitly list the partitioning columns. For example, the following query returns an error:

-- This query returns an error.
LOAD DATA INTO mydataset.table1
  (
    x INT64, -- column_list is given but the partition column list is missing
    y STRING
  )
  FROM FILES(
    format='AVRO',
    uris = ['gs://bucket/path/*'],
    hive_partition_uri_prefix='gs://bucket/path'
  )
  WITH PARTITION COLUMNS

Load data with cross-cloud transfer

Example 1

The following example loads a parquet file named sample.parquet from an Amazon S3 bucket into the test_parquet table with an auto-detect schema:

LOAD DATA INTO mydataset.testparquet
  FROM FILES (
    uris = ['s3://test-bucket/sample.parquet'],
    format = 'PARQUET'
  )
  WITH CONNECTION `aws-us-east-1.test-connection`

Example 2

The following example loads a CSV file with the prefix sampled* from your Blob Storage into the test_csv table with predefined column partitioning by time:

LOAD DATA INTO mydataset.test_csv (Number INT64, Name STRING, Time DATE)
  PARTITION BY Time
  FROM FILES (
    format = 'CSV', uris = ['azure://test.blob.core.windows.net/container/sampled*'],
    skip_leading_rows=1
  )
  WITH CONNECTION `azure-eastus2.test-connection`

Example 3

The following example overwrites the existing table test_parquet with data from a file named sample.parquet with an auto-detect schema:

LOAD DATA OVERWRITE mydataset.testparquet
  FROM FILES (
    uris = ['s3://test-bucket/sample.parquet'],
    format = 'PARQUET'
  )
  WITH CONNECTION `aws-us-east-1.test-connection`

Export and load statements in GoogleSQL

EXPORT DATA statement

Syntax

Arguments

Export to Cloud Storage or Amazon S3

Cloud Storage and Amazon S3 export option list

Examples

Export data to Cloud Storage in CSV format

Export data to Cloud Storage in Avro format

Export data to Cloud Storage in Parquet format

Export data to Amazon S3 in JSON format

Export to Bigtable

Bigtable export option list

Example

Export to Pub/Sub

Pub/Sub export option list

Example

Export to Spanner

Spanner export option list

Examples

Export data to Spanner

LOAD DATA statement

Syntax

Arguments

column

column_option_list

partition_expression

table_option_list

load_option_list

Examples

Load data into a table

Load data using a schema

Set options when creating a new table

Overwrite an existing table

Load data into a temporary table

Specify table partitioning and clustering

Load data into a partition

Load a file that is externally partitioned

Load data with cross-cloud transfer

Example 1

Example 2

Example 3

`EXPORT DATA` statement

`LOAD DATA` statement

`column`

`column_option_list`

`partition_expression`

`table_option_list`

`load_option_list`