Export and load statements in GoogleSQL
EXPORT DATA
statement
The EXPORT DATA
statement exports the results of a query to an external
storage location. You can export to the following services:
- Cloud Storage
- Amazon Simple Storage Service (Amazon S3)
- Spanner
- Bigtable
- Pub/Sub
Syntax
EXPORT DATA [WITH CONNECTION connection_name] OPTIONS (export_option_list) AS query_statement
Arguments
connection_name
: Specifies a connection that has credentials for accessing the Amazon S3 data. Specify the connection name in the formPROJECT_ID.LOCATION.CONNECTION_ID
. If the project ID or location contains a dash, enclose the connection name in backticks (`
). Connections aren't required to export to Google Cloud services.export_option_list
: Specifies a list of options for the export operation, including the URI of the destination. For more information, see the following sections:query_statement
: A SQL query. The query result is exported to the external destination. The query can't reference metatables, includingINFORMATION_SCHEMA
views, system tables, or wildcard tables.
Export to Cloud Storage or Amazon S3
You can export BigQuery data to Cloud Storage or Amazon S3 in Avro, CSV, JSON, and Parquet formats. For more information about exporting to Cloud Storage, see Export table data to Cloud Storage.
Use the format
option to specify the format of the exported data. The
following limitations apply:
- You cannot export nested and repeated data in CSV format.
- If you export data in JSON format,
INT64
data types are encoded as JSON strings to preserve 64-bit precision.
You are not billed for the export operation, but you are billed for running the query and for storing data in Cloud Storage or or Amazon S3. For more information, see Cloud Storage pricing or Amazon S3 pricing.
Cloud Storage and Amazon S3 export option list
The option list specifies options for exporting to Cloud Storage or
Amazon S3. Specify the option list in the following
format: NAME=VALUE, ...
Options | |
---|---|
compression |
Specifies a compression format. If not specified, the exported files
are uncompressed. Supported values include: |
field_delimiter |
The delimiter used to separate fields. Default: Applies to: CSV. |
format |
Required. The format of the exported data. Supported values include:
|
header |
If Applies to: CSV. |
overwrite |
If Note: When |
uri |
Required. The destination URI for the export. The Examples: |
use_avro_logical_types |
Whether to use appropriate AVRO logical types when exporting
Applies to: AVRO. For more information, see Avro export details. |
Examples
The following examples show common use cases for exporting to Cloud Storage or Amazon S3.
Export data to Cloud Storage in CSV format
The following example exports data to a CSV file. It includes options to
overwrite the destination location, write header rows, and use ';'
as a
delimiter.
EXPORT DATA OPTIONS( uri='gs://bucket/folder/*.csv', format='CSV', overwrite=true, header=true, field_delimiter=';') AS SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
Export data to Cloud Storage in Avro format
The following example exports data to Avro format using Snappy compression.
EXPORT DATA OPTIONS( uri='gs://bucket/folder/*', format='AVRO', compression='SNAPPY') AS SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
Export data to Cloud Storage in Parquet format
The following example exports data to Parquet format. It includes the option to overwrite the destination location.
EXPORT DATA OPTIONS( uri='gs://bucket/folder/*', format='PARQUET', overwrite=true) AS SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
Export data to Amazon S3 in JSON format
The following example exports query results that run against a BigLake table based on Amazon S3 to your Amazon S3 bucket:
EXPORT DATA WITH CONNECTION myproject.us.myconnection OPTIONS( uri='s3://bucket/folder/*', format='JSON', overwrite=true) AS SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
Export to Bigtable
You can export BigQuery data to a Bigtable table
by using the EXPORT DATA
statement. For Bigtable export
examples and configuration options, see
Export data to Bigtable.
You are not billed for the export operation, but you are billed for running the query and for storing data in Bigtable. For more information, see Bigtable pricing.
Bigtable export option list
The option list specifies options for exporting to Bigtable.
Specify the option list in the following format: NAME=VALUE
, ...
Options | |
---|---|
format |
Required. When exporting to Bigtable, the value must
always be |
bigtable_options |
JSON string containing configurations related to mapping exported
fields to Bigtable columns families and columns. For more
information, see Configure exports with |
overwrite |
If |
truncate |
If |
uri |
Required. The destination URI for the
export. We recommend specifying an app profile for traffic routing and
visibility at monitoring dashboards provided by
Bigtable. The |
auto_create_column_families |
If |
Example
The following example exports data to a Bigtable table. Data in
field1
becomes a row key in Bigtable destination table. The
fields field2
, field3
and field4
are written as columns cbtFeld2
,
cbtField3
and cbtField4
into column family column_family
.
EXPORT DATA OPTIONS ( uri="https://bigtable.googleapis.com/projects/my-project/instances/my-instance/tables/my-table", format="CLOUD_BIGTABLE", bigtable_options="""{ "columnFamilies" : [ { "familyId": "column_family", "columns": [ {"qualifierString": "cbtField2", "fieldName": "field2"}, {"qualifierString": "cbtField3", "fieldName": "field3"}, {"qualifierString": "cbtField4", "fieldName": "field4"}, ] } ] }""" ) AS SELECT CAST(field1 as STRING) as rowkey, STRUCT(field2, field3, field4) as column_family FROM `bigquery_table`
Export to Pub/Sub
You can export BigQuery data to a Pub/Sub topic
by using the EXPORT DATA
statement in a
continuous query. For
more information about Pub/Sub configuration options, see
Export data to Pub/Sub.
For information about the costs involved with exporting to Pub/Sub by using a continuous query, see Costs.
Pub/Sub export option list
The option list specifies options for exporting to Pub/Sub.
Specify the option list in the following format: NAME=VALUE
, ...
Options | |
---|---|
format |
Required. When exporting to Pub/Sub, the value must
always be |
uri |
Required. The destination URI for the
export. The |
Example
The following example shows a continuous query that filters data from a BigQuery table that is receiving streaming taxi ride information, and publishes the data to a Pub/Sub topic in real time:
EXPORT DATA OPTIONS ( format = 'CLOUD_PUBSUB', uri = 'https://pubsub.googleapis.com/projects/myproject/topics/taxi-real-time-rides') AS ( SELECT TO_JSON_STRING( STRUCT( ride_id, timestamp, latitude, longitude)) AS message FROM `myproject.real_time_taxi_streaming.taxi_rides` WHERE ride_status = 'enroute' );
Export to Spanner
To provide feedback or request support for this feature, send email to bq-cloud-spanner-federation-preview@google.com.
You can export data from a BigQuery table to a
Spanner table by using the EXPORT DATA
statement.
Spanner export option list
The option list specifies options for the export operation. Specify the option
list in the following format: NAME=VALUE, ...
Options | |
---|---|
format |
Required. To export data from BigQuery to
Spanner, the value must always be
|
uri |
Required. The destination URI for the export. For Spanner,
the URI must be provided in the following format:
|
spanner_options |
Required. A JSON string containing configurations related to mapping exported
fields to Spanner column families and columns. For more
information, see
Configure exports with |
Examples
Export data to Spanner
The following example exports data to a Spanner table:
EXPORT DATA OPTIONS ( uri="https://spanner.googleapis.com/projects/my-project/instances/my-instance/databases/my-database", format="CLOUD_SPANNER", spanner_options="""{ "table": "my_table" }""" ) AS SELECT * FROM `bigquery_table`
For more Spanner export examples and configuration options, see Export data to Spanner.
LOAD DATA
statement
Loads data from one or more files into a table. The statement can create a new
table, append data into an existing table or partition, or overwrite an existing
table or partition. If the LOAD DATA
statement fails, the table into which you
are loading data remains unchanged.
Syntax
LOAD DATA {OVERWRITE|INTO} [{TEMP|TEMPORARY} TABLE] [[project_name.]dataset_name.]table_name [( column_list )] [[OVERWRITE] PARTITIONS (partition_column_name=partition_value)] [PARTITION BY partition_expression] [CLUSTER BY clustering_column_list] [OPTIONS (table_option_list)] FROM FILES(load_option_list) [WITH PARTITION COLUMNS [(partition_column_list)] ] [WITH CONNECTION connection_name] column_list: column[, ...] partition_column_list: partition_column_name, partition_column_type[, ...]
Arguments
INTO
: If a table with this name already exists, the statement appends data to the table. You must useINTO
instead ofOVERWRITE
if your statement includes thePARTITIONS
clause.OVERWRITE
: If a table with this name already exists, the statement overwrites the table.{TEMP|TEMPORARY} TABLE
: Use this clause to create or write to a temporary table.project_name
: The name of the project for the table. The value defaults to the project that runs this DDL query.dataset_name
: The name of the dataset for the table.table_name
: The name of the table.column_list
: Contains the table's schema information as a list of table columns. For more information about table schemas, see Specifying a schema. If you don't specify a schema, BigQuery uses schema auto-detection to infer the schema.When you load hive-partitioned data into a new table or overwrite an existing table, then that table schema contains the hive-partitioned columns and the columns in the
column_list
.If you append hive-partitioned data to an existing table, then the hive-partitioned columns and
column_list
can be a subset of the existing columns. If the combined list of columns in not a subset of the existing columns, then the following rules apply:If your data is self-describing, such as ORC, PARQUET, or AVRO, then columns in the source file that are omitted from the
column_list
are ignored. Columns in thecolumn_list
that don't exist in the source file are written withNULL
values. If a column is in thecolumn_list
and the source file, then their types must match.If your data is not self-describing, such as CSV or JSON, then columns in the source file that are omitted from the
column_list
are only ignored if you setignore_unknown_values
toTRUE
. Otherwise this statement returns an error. You can't list columns in thecolumn_list
that don't exist in the source file.
[OVERWRITE] PARTITIONS
: Use this clause to write to or overwrite exactly one partition. When you use this clause, the statement must begin withLOAD DATA INTO
.partition_column_name
: The name of the partitioned column to write to. If you use both thePARTITIONS
and thePARTITION BY
clauses, then the column names must match.partition_value
: Thepartition_id
of the partition to append or overwrite. To find thepartition_id
values of a table, query theINFORMATION_SCHEMA.PARTITIONS
view. You can't set thepartition_value
to__NULL__
or__UNPARTITIONED__
. You can only append to or overwrite one partition. If your data contains values that belong to multiple partitions, then the statement fails with an error. Thispartition_value
must be literal value.partition_expression
: Specifies the table partitioning when creating a new table.clustering_column_list
: Specifies table clustering when creating a new table. The value is a comma-separated list of column names, with up to four columns.table_option_list
: Specifies options for creating the table. If you include this clause and the table already exists, then the options must match the existing table specification.partition_column_list
: A list of external partitioning columns.connection_name
: The connection name that is used to read the source files from an external data source.load_option_list
: Specifies options for loading the data.
If no table exists with the specified name, then the statement creates a new
table. If a table already exists with the specified name, then the behavior
depends on the INTO
or OVERWRITE
keyword. The INTO
keyword appends the
data to the table, and the OVERWRITE
keyword overwrites the table.
If your external data uses a
hive-partitioned layout,
then include the WITH PARTITION COLUMNS
clause. If you include the WITH
PARTITION COLUMNS
clause without partition_column_list
, then
BigQuery infers the partitioning from the data layout. If you
include both column_list
and WITH PARTITION COLUMNS
, then
partition_column_list
is required.
You can't use the LOAD DATA
statement to load data into a temporary table.
column
(column_name column_schema[, ...])
contains the table's
schema information in a comma-separated list.
column := column_name column_schema column_schema := { simple_type | STRUCT<field_list> | ARRAY<array_element_schema> } [PRIMARY KEY NOT ENFORCED | REFERENCES table_name(column_name) NOT ENFORCED] [DEFAULT default_expression] [NOT NULL] [OPTIONS(column_option_list)] simple_type := { data_type | STRING COLLATE collate_specification } field_list := field_name column_schema [, ...] array_element_schema := { simple_type | STRUCT<field_list> } [NOT NULL]
column_name
is the name of the column. A column name:- Must contain only letters (a-z, A-Z), numbers (0-9), or underscores (_)
- Must start with a letter or underscore
- Can be up to 300 characters
column_schema
: Similar to a data type, but supports an optionalNOT NULL
constraint for types other thanARRAY
.column_schema
also supports options on top-level columns andSTRUCT
fields.column_schema
can be used only in the column definition list ofCREATE TABLE
statements. It cannot be used as a type in expressions.simple_type
: Any supported data type aside fromSTRUCT
andARRAY
.If
simple_type
is aSTRING
, it supports an additional clause for collation, which defines how a resultingSTRING
can be compared and sorted. The syntax looks like this:STRING COLLATE collate_specification
If you have
DEFAULT COLLATE collate_specification
assigned to the table, the collation specification for a column overrides the specification for the table.default_expression
: The default value assigned to the column.field_list
: Represents the fields in a struct.field_name
: The name of the struct field. Struct field names have the same restrictions as column names.NOT NULL
: When theNOT NULL
constraint is present for a column or field, the column or field is created withREQUIRED
mode. Conversely, when theNOT NULL
constraint is absent, the column or field is created withNULLABLE
mode.Columns and fields of
ARRAY
type do not support theNOT NULL
modifier. For example, acolumn_schema
ofARRAY<INT64> NOT NULL
is invalid, sinceARRAY
columns haveREPEATED
mode and can be empty but cannot beNULL
. An array element in a table can never beNULL
, regardless of whether theNOT NULL
constraint is specified. For example,ARRAY<INT64>
is equivalent toARRAY<INT64 NOT NULL>
.The
NOT NULL
attribute of a table'scolumn_schema
does not propagate through queries over the table. If tableT
contains a column declared asx INT64 NOT NULL
, for example,CREATE TABLE dataset.newtable AS SELECT x FROM T
creates a table nameddataset.newtable
in whichx
isNULLABLE
.
column_option_list
Specify a column option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
description |
|
Example: This property is equivalent to the schema.fields[].description table resource property. |
rounding_mode |
|
Example: This specifies the rounding mode
that's used for values written to a
This property is equivalent to the
|
VALUE
is a constant expression containing only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, orUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
Setting the VALUE
replaces the existing value of that option for the column, if
there was one. Setting the VALUE
to NULL
clears the column's value for that
option.
partition_expression
PARTITION BY
is an optional clause that controls
table partitioning. partition_expression
is an expression that determines how to partition the table. The partition
expression can contain the following values:
_PARTITIONDATE
. Partition by ingestion time with daily partitions. This syntax cannot be used with theAS query_statement
clause.DATE(_PARTITIONTIME)
. Equivalent to_PARTITIONDATE
. This syntax cannot be used with theAS query_statement
clause.<date_column>
. Partition by aDATE
column with daily partitions.DATE({ <timestamp_column> | <datetime_column> })
. Partition by aTIMESTAMP
orDATETIME
column with daily partitions.DATETIME_TRUNC(<datetime_column>, { DAY | HOUR | MONTH | YEAR })
. Partition by aDATETIME
column with the specified partitioning type.TIMESTAMP_TRUNC(<timestamp_column>, { DAY | HOUR | MONTH | YEAR })
. Partition by aTIMESTAMP
column with the specified partitioning type.TIMESTAMP_TRUNC(_PARTITIONTIME, { DAY | HOUR | MONTH | YEAR })
. Partition by ingestion time with the specified partitioning type. This syntax cannot be used with theAS query_statement
clause.DATE_TRUNC(<date_column>, { MONTH | YEAR })
. Partition by aDATE
column with the specified partitioning type.RANGE_BUCKET(<int64_column>, GENERATE_ARRAY(<start>, <end>[, <interval>]))
. Partition by an integer column with the specified range, where:start
is the start of range partitioning, inclusive.end
is the end of range partitioning, exclusive.interval
is the width of each range within the partition. Defaults to 1.
table_option_list
The option list allows you to set table options such as a label and an expiration time. You can include multiple options using a comma-separated list.
Specify a table option list in the following format:
NAME=VALUE, ...
NAME
and VALUE
must be one of the following combinations:
NAME |
VALUE |
Details |
---|---|---|
expiration_timestamp |
TIMESTAMP |
Example: This property is equivalent to the expirationTime table resource property. |
partition_expiration_days |
|
Example: Sets the partition expiration in days. For more information, see Set the partition expiration. By default, partitions do not expire. This property is equivalent to the timePartitioning.expirationMs table resource property but uses days instead of milliseconds. One day is equivalent to 86400000 milliseconds, or 24 hours. This property can only be set if the table is partitioned. |
require_partition_filter |
|
Example: Specifies whether queries on this table must include a a predicate
filter that filters on the partitioning column. For more information,
see
Set partition filter requirements. The default value is
This property is equivalent to the timePartitioning.requirePartitionFilter table resource property. This property can only be set if the table is partitioned. |
friendly_name |
|
Example: This property is equivalent to the friendlyName table resource property. |
description |
|
Example: This property is equivalent to the description table resource property. |
labels |
|
Example: This property is equivalent to the labels table resource property. |
default_rounding_mode |
|
Example: This specifies the default rounding mode
that's used for values written to any new
This property is equivalent to the
|
enable_change_history |
|
In preview. Example: Set this property to |
max_staleness |
|
Example: The maximum interval behind the current time where it's
acceptable to read stale data. For example, with
change data capture,
when this option is set, the table copy operation is denied if data is
more stale than the
|
VALUE
is a constant expression containing only literals, query parameters,
and scalar functions.
The constant expression cannot contain:
- A reference to a table
- Subqueries or SQL statements such as
SELECT
,CREATE
, orUPDATE
- User-defined functions, aggregate functions, or analytic functions
- The following scalar functions:
ARRAY_TO_STRING
REPLACE
REGEXP_REPLACE
RAND
FORMAT
LPAD
RPAD
REPEAT
SESSION_USER
GENERATE_ARRAY
GENERATE_DATE_ARRAY
load_option_list
Specifies options for loading data from external files. The format
and uris
options are required. Specify the option list in the following format:
NAME=VALUE, ...
Options | |
---|---|
allow_jagged_rows |
If Applies to CSV data. |
allow_quoted_newlines |
If Applies to CSV data. |
bigtable_options |
Only required when creating a Bigtable external table. Specifies the schema of the Bigtable external table in JSON format. For a list of Bigtable table definition options, see
|
column_name_character_map |
Defines the scope of supported column name characters and the handling behavior of unsupported characters. Supported values include:
Applies to CSV and Parquet data. |
compression |
The compression type of the data source. Supported values include:
Applies to CSV and JSON data. |
decimal_target_types |
Determines how to convert a Example: |
enable_list_inference |
If Applies to Parquet data. |
enable_logical_types |
If Applies to Avro data. |
encoding |
The character encoding of the data. Supported values include:
Applies to CSV data. |
enum_as_string |
If Applies to Parquet data. |
field_delimiter |
The separator for fields in a CSV file. Applies to CSV data. |
format |
The format of the external data.
Supported values for
Supported values for
The value |
hive_partition_uri_prefix |
A common prefix for all source URIs before the partition key encoding begins. Applies only to hive-partitioned external tables. Applies to Avro, CSV, JSON, Parquet, and ORC data. Example: |
file_set_spec_type |
Specifies how to interpret source URIs for load jobs and external tables. Supported values include:
For example, if you have a source URI of |
ignore_unknown_values |
If Applies to CSV and JSON data. |
json_extension |
For JSON data, indicates a particular JSON interchange format. If not specified, BigQuery reads the data as generic JSON records. Supported values include: |
max_bad_records |
The maximum number of bad records to ignore when reading the data. Applies to: CSV, JSON, and Google Sheets data. |
max_staleness |
Applicable for BigLake tables and object tables. Specifies whether cached metadata is used by operations against the table, and how fresh the cached metadata must be in order for the operation to use it. To disable metadata caching, specify 0. This is the default. To enable metadata caching, specify an
interval literal
value between 30 minutes and 7 days. For example, specify
|
null_marker |
The string that represents Applies to CSV data. |
object_metadata |
Only required when creating an object table. Set the value of this option to |
preserve_ascii_control_characters |
If Applies to CSV data. |
quote |
The string used to quote data sections in a CSV file. If your data
contains quoted newline characters, also set the
Applies to CSV data. |
skip_leading_rows |
The number of rows at the top of a file to skip when reading the data. Applies to CSV and Google Sheets data. |
uris |
For external tables, including object tables, that aren't Bigtable tables:
An array of fully qualified URIs for the external data locations.
Each URI can contain one
asterisk ( The following examples show valid
For Bigtable tables:
The URI identifying the Bigtable table to use as a data source. You can only specify one Bigtable URI. Example:
For more information on constructing a Bigtable URI, see Retrieving the Bigtable URI. |
Examples
The following examples show common use cases for the LOAD DATA
statement.
Load data into a table
The following example loads an Avro file into a table. Avro is a self-describing format, so BigQuery infers the schema.
LOAD DATA INTO mydataset.table1 FROM FILES( format='AVRO', uris = ['gs://bucket/path/file.avro'] )
The following example loads two CSV files into a table, using schema autodetection.
LOAD DATA INTO mydataset.table1 FROM FILES( format='CSV', uris = ['gs://bucket/path/file1.csv', 'gs://bucket/path/file2.csv'] )
Load data using a schema
The following example loads a CSV file into a table, using a specified table schema.
LOAD DATA INTO mydataset.table1(x INT64, y STRING) FROM FILES( skip_leading_rows=1, format='CSV', uris = ['gs://bucket/path/file.csv'] )
Set options when creating a new table
The following example creates a new table with a description and an expiration time.
LOAD DATA INTO mydataset.table1 OPTIONS( description="my table", expiration_timestamp="2025-01-01 00:00:00 UTC" ) FROM FILES( format='AVRO', uris = ['gs://bucket/path/file.avro'] )
Overwrite an existing table
The following example overwrites an existing table.
LOAD DATA OVERWRITE mydataset.table1 FROM FILES( format='AVRO', uris = ['gs://bucket/path/file.avro'] )
Load data into a temporary table
The following example loads an Avro file into a temporary table.
LOAD DATA INTO TEMP TABLE mydataset.table1 FROM FILES( format='AVRO', uris = ['gs://bucket/path/file.avro'] )
Specify table partitioning and clustering
The following example creates a table that is partitioned by the
transaction_date
field and clustered by the customer_id
field. It also
configures the partitions to expire after three days.
LOAD DATA INTO mydataset.table1 PARTITION BY transaction_date CLUSTER BY customer_id OPTIONS( partition_expiration_days=3 ) FROM FILES( format='AVRO', uris = ['gs://bucket/path/file.avro'] )
Load data into a partition
The following example loads data into a selected partition of an ingestion-time partitioned table:
LOAD DATA INTO mydataset.table1 PARTITIONS(_PARTITIONTIME = TIMESTAMP '2016-01-01') PARTITION BY _PARTITIONTIME FROM FILES( format = 'AVRO', uris = ['gs://bucket/path/file.avro'] )
Load a file that is externally partitioned
The following example loads a set of external files that use a hive partitioning layout.
LOAD DATA INTO mydataset.table1 FROM FILES( format='AVRO', uris = ['gs://bucket/path/*'], hive_partition_uri_prefix='gs://bucket/path' ) WITH PARTITION COLUMNS( field_1 STRING, -- column order must match the external path field_2 INT64 )
The following example infers the partitioning layout:
LOAD DATA INTO mydataset.table1 FROM FILES( format='AVRO', uris = ['gs://bucket/path/*'], hive_partition_uri_prefix='gs://bucket/path' ) WITH PARTITION COLUMNS
If you include both column_list
and WITH PARTITION COLUMNS
, then you must
explicitly list the partitioning columns. For example, the following query
returns an error:
-- This query returns an error. LOAD DATA INTO mydataset.table1 ( x INT64, -- column_list is given but the partition column list is missing y STRING ) FROM FILES( format='AVRO', uris = ['gs://bucket/path/*'], hive_partition_uri_prefix='gs://bucket/path' ) WITH PARTITION COLUMNS
Load data with cross-cloud transfer
Example 1
The following example loads a parquet file named sample.parquet
from an Amazon S3
bucket into the test_parquet
table with an auto-detect schema:
LOAD DATA INTO mydataset.testparquet FROM FILES ( uris = ['s3://test-bucket/sample.parquet'], format = 'PARQUET' ) WITH CONNECTION `aws-us-east-1.test-connection`
Example 2
The following example loads a CSV file with the prefix sampled*
from your
Blob Storage into the test_csv
table with predefined column
partitioning by time:
LOAD DATA INTO mydataset.test_csv (Number INT64, Name STRING, Time DATE) PARTITION BY Time FROM FILES ( format = 'CSV', uris = ['azure://test.blob.core.windows.net/container/sampled*'], skip_leading_rows=1 ) WITH CONNECTION `azure-eastus2.test-connection`
Example 3
The following example overwrites the existing table test_parquet
with
data from a file named sample.parquet
with an auto-detect schema:
LOAD DATA OVERWRITE mydataset.testparquet FROM FILES ( uris = ['s3://test-bucket/sample.parquet'], format = 'PARQUET' ) WITH CONNECTION `aws-us-east-1.test-connection`