4

I'm trying to upload a local CSV to google big query using python

def uploadCsvToGbq(self,table_name):


    load_config = {
    'destinationTable': {
    'projectId': self.project_id,
    'datasetId': self.dataset_id,
    'tableId': table_name
    }
    }

    load_config['schema'] = {
    'fields': [
    {'name':'full_name', 'type':'STRING'},
    {'name':'age', 'type':'INTEGER'},
    ]
    }
    load_config['sourceFormat'] = 'CSV'

    upload = MediaFileUpload('sample.csv',
                     mimetype='application/octet-stream',
                     # This enables resumable uploads.
                     resumable=True)
    start = time.time()
    job_id = 'job_%d' % start
    # Create the job.
    result = bigquery.jobs.insert(
    projectId=self.project_id,
    body={
    'jobReference': {
    'jobId': job_id
    },
    'configuration': {
    'load': load_config
    }
    },
    media_body=upload).execute()

    return result

when I run this it throws error like

"NameError: global name 'MediaFileUpload' is not defined"

whether any module is needed please help.

2

3 Answers 3

5

One of easiest method to upload to csv file in GBQ is through pandas.Just import csv file to pandas (pd.read_csv()). Then from pandas to GBQ (df.to_gbq(full_table_id, project_id=project_id)).

import pandas as pd
import csv
df=pd.read_csv('/..localpath/filename.csv')
df.to_gbq(full_table_id, project_id=project_id)

Or you can use client api

from google.cloud import bigquery
import pandas as pd
df=pd.read_csv('/..localpath/filename.csv')
client = bigquery.Client()
dataset_ref = client.dataset('my_dataset')
table_ref = dataset_ref.table('new_table')
client.load_table_from_dataframe(df, table_ref).result()
Sign up to request clarification or add additional context in comments.

1 Comment

I had to install pandas_gbq to be able to use to_gbq method.
2
pip install --upgrade google-api-python-client

Then on top of your python file write:

from googleapiclient.http import MediaFileUpload

But care you miss some parenthesis. Better write:

result = bigquery.jobs().insert(projectId=PROJECT_ID, body={'jobReference': {'jobId': job_id},'configuration': {'load': load_config}}, media_body=upload).execute(num_retries=5)

And by the way, you are going to upload all your CSV rows, including the top one that defines columns.

Comments

0

The class MediaFileUpload is in http.py. See https://google-api-python-client.googlecode.com/hg/docs/epy/apiclient.http.MediaFileUpload-class.html

1 Comment

This link is now broken.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.