BigQuery Batch Import

I'm trying to get a bit better performance out of data loading into BigQuery by batching together rows when I do an InsertAll (https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll). I use the URL formatted as:

 /v2/projects//datasets//tables//insertAll

and I also set the batch limit to 100 with a modification to my handlebars template to iterate through the rows, as an example:

{
"skipInvalidRows": true,
"ignoreUnknownValues": true,
"rows": [
{{#each batch_of_records}}
{
"json": {
"internal_id": "{{record.internal_id}}",
"display_name": "{{record.display_name}}",
"description_featured": "{{record.description_featured}}",
"modified": "{{record.modified}}",
"location": "{{record.location}}",
}
}{{#if @last}}{{else}},{{/if}}
{{/each}}
]
}

The problem arises, I think, because of the way BigQuery returns successes without confirming the ID's and just returning the errors, using the following schema:

{
"kind": string,
"insertErrors": [
{
"index": integer,
"errors": [
{
object (ErrorProto)
}
]
}
]
}

This I think leads to the error: Processing of submitResponse did not return same record count as the current batch size.

Has anyone had any luck implementing batching for BigQuery or know of a way to process or ignore this functionality ?

Thanks for posting this to the community, @glennprince. I'm checking with some of our internal experts and will get back to you with any recommendations they have.

Thanks Kate :-)

Hi Glenn Prince,
We are trying to reproduce this from our end.

From the data that you've provided above, I can suggest some changes to your request body,

If you have "batch_of_records" array in your export data, please give the Resource path as "batch_of_records" in your export. Once you give the resource path, you need to give the request body in the BigQuery import as follows,

{
"skipInvalidRows": true,
"ignoreUnknownValues": true,
"rows": [
{
"json": {
"internal_id": "{{record.internal_id}}",
"display_name": "{{record.display_name}}",
"description_featured": "{{record.description_featured}}",
"modified": "{{record.modified}}",
"location": "{{record.location}}"
}
}
]
}

Also, can you please turn on your debugger while running the flow and send us the debug data? So that we can inspect further.
It will be helpful if you can provide some sample request and response data.

Thank you!

Hi Srilekha,

I've tried making these updates and get similar results. I've turned on debugging and got some logs, is there a preferred way to send them to you ?

Glenn

Hi Glenn,

You can send them here if it is fine for you. You don't have to send the whole debug data though. Also, how does your export side data look like?
Maybe a sample request body and response data that you're getting in the debug and the error message. This would help us analyze and debug the issue.


Thanks!

Debug Log looks like it repeats the following pretty much:

2021-05-03T16:21:06.085Z 7f8ff1ee76974cc985c4156f7147f063 import 5fdbbc24d611fd2d757e593a
{"url":"https://bigquery.googleapis.com/bigquery/v2/projects/adswerve-datatest-dev/datasets/netsuitetest/tables/Item/insertAll","method":"POST","body":"{\"skipInvalidRows\":true,\"ignoreUnknownValues\":true,\"rows\":[{\"json\":{\"internal_id\":\"8\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"8\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"8\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"9\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:26 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"9\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:26 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"9\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:26 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"11\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"11\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"11\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"14\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:26 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"14\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:26 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"14\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:26 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"16\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"16\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"16\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"17\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"17\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"17\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"12/10/2020 4:30 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"19\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"02/02/2021 5:42 pm\",\"location\":\"\"}},{\"json\":{\"internal_id\":\"19\",\"display_name\":\"\",\"description_featured\":\"\",\"modified\":\"02/02/2021 5:42 pm\",\"location\":\"\"}}]}","headers":{"Authorization":"Bearer ********","content-type":"application/json","accept":"application/json"},"requestIndex":0}


2021-05-03T16:21:06.234Z 7f8ff1ee76974cc985c4156f7147f063 import 5fdbbc24d611fd2d757e593a
{"headers":{"content-type":"application/json; charset=UTF-8","vary":"X-Origin, Referer, Origin,Accept-Encoding","date":"Mon, 03 May 2021 16:21:06 GMT","server":"ESF","cache-control":"private","x-xss-protection":"0","x-frame-options":"SAMEORIGIN","alt-svc":"h3-29=\":443\"; ma=2592000,h3-T051=\":443\"; ma=2592000,h3-Q050=\":443\"; ma=2592000,h3-Q046=\":443\"; ma=2592000,h3-Q043=\":443\"; ma=2592000,quic=\":443\"; ma=2592000; v=\"46,43\"","accept-ranges":"none","connection":"close","transfer-encoding":"chunked"},"body":"{\n \"kind\": \"bigquery#tableDataInsertAllResponse\"\n}\n"}

Preview Input:

{
"page_of_records": [
{
"record": {
"internal_id": "8",
"display_name": "",
"description_featured": "",
"modified": "12/10/2020 4:30 pm",
"location": "",
}
}
]
}

Preview Ouput (Your handlebar code):

{
"skipInvalidRows": true,
"ignoreUnknownValues": true,
"rows": [
{
"json": {
"internal_id": "",
"display_name": "",
"description_featured": "",
"modified": "",
"location": ""
}
}
]
}

Preview Output (using the if/else to structure the JSON):

{
"skipInvalidRows": true,
"ignoreUnknownValues": true,
"rows": [
{
"json": {
"internal_id": "8",
"display_name": "",
"description_featured": "",
"modified": "12/10/2020 4:30 pm",
"location": ""
}
},
{
"json": {
"internal_id": "8",
"display_name": "",
"description_featured": "",
"modified": "12/10/2020 4:30 pm",
"location": ""
}
},
{
"json": {
"internal_id": "8",
"display_name": "",
"description_featured": "",
"modified": "12/10/2020 4:30 pm",
"location": ""
}
},
{
"json": {
"internal_id": "8",
"display_name": "",
"description_featured": "",
"modified": "12/10/2020 4:30 pm",
"location": ""
}
},
{
"json": {
"internal_id": "8",
"display_name": "",
"description_featured": "",
"modified": "12/10/2020 4:30 pm",
"location": ""
}
}
]
}

Hi Glenn,

Thanks for the details!
We will look into it.

Are you getting any specific error message in the dashboard/debug logs? or is it a generic one?
Please send the error details as well.

Thank you!

Hi Srilekha,

No, I just get the generic error: Processing of submitResponse did not return same record count as the current batch size.

Glenn

Hi Glenn,

Can you please try the following:

1. In your import, set the Batch size to 1.
2. Change your body template to the following format and provide the resource path in the export if any:

{
"skipInvalidRows": true,
"ignoreUnknownValues": true,
"rows": [
{
"json": {
"internal_id": "{{record.internal_id}}",
"display_name": "{{record.display_name}}",
"description_featured": "{{record.description_featured}}",
"modified": "{{record.modified}}",
"location": "{{record.location}}"
}
}
]
}

Once you try this, let us know if the same issue persists or if you face any other issues.

Thank you!

Hi Srilekha,

When you set the batch size to 1 this works perfectly fine, but I am trying to import hundreds of thousands of rows sometimes so being able to batch them would give me a lot better performance. My assumption is that BigQuery probably doesn't support batching at this point, I just wanted to make sure.

Glenn

Hi Glenn,

In that case, can you send us the response data when:

  1. the records are successfully imported into BigQuery
  2. the records are failing

This would help us understand your issue better.
And are you providing any resource ID path, success path while saving the batch import?

Also, as you said, "BigQuery probably doesn't support batching at this point" did you have any luck trying on any other apps like postman?

Hi Srileka,

When I do one record at a time I get a success and the records are inserted. When I try and do a batch I get the error in the UI "Processing of submitResponse did not return same record count as the current batch size"

When I use Postman with the output above I get the following 200 OK response and the five records are inserted:

{
"kind": "bigquery#tableDataInsertAllResponse"
}

This matches the documentation (https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll) response structure:

{
  "kind": string,
  "insertErrors": [
    {
      "index": integer,
      "errors": [
        {
          object (ErrorProto) 
}
]
}
]
}

Glenn

Hi Glenn,

Sorry for the delayed response.

Upon checking internally, we found out that this will be an enhancement to our product. Thank you for bringing this to our notice and helping us enhance our product. It is due to the response format that we are getting from BigQuery.

As of now, to insert data into BigQuery table, we need to set batch size as 1.

Please let us know if you have any other concerns.

Thank you!


Hi, is there an update on this enhancement?

Hi @jonadams, we recently updated BigQuery connector which supports bulk insert. Here's a help article for your reference :

https://docs.celigo.com/hc/en-us/articles/6923178410523-Import-data-into-Google-BigQuery

Hey Jon, to add a little more color to Bhavik's post, we actually launched a brand new BigQuery connector underneath our database connector family. Our original REST API BigQuery connector is still there, but we renamed it to "Google BigQuery (REST API)". The new connector should make it much easier to do traditional database/data warehouse use cases like bulk inserts. One key difference worth highlighting is that you will now use "field mappings" to control how your source records map to your destination table schemas. In case it helps, we use Snowflake internally at Celigo, which follows many of the same patterns, and below are some pics from one of my Snowflake bulk inserts. The last pic is where i map to specific column names in Snowflake. Hope his helps!

Hi Scott,

Thanks so much for this. I got the connector setup. Many thanks!

One last question, when using something that's a record type do you have an example of how that mapping would work?

I don't understand your question. Can you provide more context? Please also provide screenshots if possible.