Python pandas dataframe to sql

pandas.DataFrame.to_sql#

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.

Parameters name str

con sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection

Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable. See here. If passing a sqlalchemy.engine.Connection which is already in a transaction, the transaction will not be committed. If passing a sqlite3.Connection, it will not be possible to roll back the record insertion.

schema str, optional

Specify the schema (if database flavor supports this). If None, use default schema.

if_exists , default ‘fail’

How to behave if the table already exists.

  • fail: Raise a ValueError.
  • replace: Drop the table before inserting new values.
  • append: Insert new values to the existing table.

Write DataFrame index as a column. Uses index_label as the column name in the table.

index_label str or sequence, default None

Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

chunksize int, optional

Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.

dtype dict or scalar, optional

Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.

method , optional

Controls the SQL insertion clause used:

  • None : Uses standard SQL INSERT clause (one per row).
  • ‘multi’: Pass multiple values in a single INSERT clause.
  • callable with signature (pd_table, conn, keys, data_iter) .

Details and a sample callable implementation can be found in the section insert method .

Number of rows affected by to_sql. None is returned if the callable passed into method does not return an integer number of rows.

The number of returned rows affected is the sum of the rowcount attribute of sqlite3.Cursor or SQLAlchemy connectable which may not reflect the exact number of written rows as stipulated in the sqlite3 or SQLAlchemy.

When the table already exists and if_exists is ‘fail’ (the default).

Read a DataFrame from a table.

Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.

Читайте также:  Python sublists to list

Create an in-memory SQLite database.

>>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite://', echo=False) 

Create a table from scratch with 3 rows.

>>> df = pd.DataFrame('name' : ['User 1', 'User 2', 'User 3']>) >>> df name 0 User 1 1 User 2 2 User 3 
>>> df.to_sql('users', con=engine) 3 >>> from sqlalchemy import text >>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 1'), (1, 'User 2'), (2, 'User 3')] 

An sqlalchemy.engine.Connection can also be passed to con :

>>> with engine.begin() as connection: . df1 = pd.DataFrame('name' : ['User 4', 'User 5']>) . df1.to_sql('users', con=connection, if_exists='append') 2 

This is allowed to support operations that require that the same DBAPI connection is used for the entire operation.

>>> df2 = pd.DataFrame('name' : ['User 6', 'User 7']>) >>> df2.to_sql('users', con=engine, if_exists='append') 2 >>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 1'), (1, 'User 2'), (2, 'User 3'), (0, 'User 4'), (1, 'User 5'), (0, 'User 6'), (1, 'User 7')] 

Overwrite the table with just df2 .

>>> df2.to_sql('users', con=engine, if_exists='replace', . index_label='id') 2 >>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 6'), (1, 'User 7')] 

Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars.

>>> df = pd.DataFrame("A": [1, None, 2]>) >>> df A 0 1.0 1 NaN 2 2.0 
>>> from sqlalchemy.types import Integer >>> df.to_sql('integers', con=engine, index=False, . dtype="A": Integer()>) 3 
>>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM integers")).fetchall() [(1,), (None,), (2,)] 

Источник

pandas.DataFrame.to_sql#

Write records stored in a DataFrame to a SQL database.

Databases supported by SQLAlchemy [1] are supported. Tables can be newly created, appended to, or overwritten.

Parameters : name str

con sqlalchemy.engine.(Engine or Connection) or sqlite3.Connection

Using SQLAlchemy makes it possible to use any DB supported by that library. Legacy support is provided for sqlite3.Connection objects. The user is responsible for engine disposal and connection closure for the SQLAlchemy connectable. See here. If passing a sqlalchemy.engine.Connection which is already in a transaction, the transaction will not be committed. If passing a sqlite3.Connection, it will not be possible to roll back the record insertion.

schema str, optional

Specify the schema (if database flavor supports this). If None, use default schema.

if_exists , default ‘fail’

How to behave if the table already exists.

  • fail: Raise a ValueError.
  • replace: Drop the table before inserting new values.
  • append: Insert new values to the existing table.

Write DataFrame index as a column. Uses index_label as the column name in the table.

index_label str or sequence, default None

Column label for index column(s). If None is given (default) and index is True, then the index names are used. A sequence should be given if the DataFrame uses MultiIndex.

chunksize int, optional

Specify the number of rows in each batch to be written at a time. By default, all rows will be written at once.

dtype dict or scalar, optional

Specifying the datatype for columns. If a dictionary is used, the keys should be the column names and the values should be the SQLAlchemy types or strings for the sqlite3 legacy mode. If a scalar is provided, it will be applied to all columns.

method , optional

Controls the SQL insertion clause used:

  • None : Uses standard SQL INSERT clause (one per row).
  • ‘multi’: Pass multiple values in a single INSERT clause.
  • callable with signature (pd_table, conn, keys, data_iter) .

Details and a sample callable implementation can be found in the section insert method .

Number of rows affected by to_sql. None is returned if the callable passed into method does not return an integer number of rows.

The number of returned rows affected is the sum of the rowcount attribute of sqlite3.Cursor or SQLAlchemy connectable which may not reflect the exact number of written rows as stipulated in the sqlite3 or SQLAlchemy.

When the table already exists and if_exists is ‘fail’ (the default).

Read a DataFrame from a table.

Timezone aware datetime columns will be written as Timestamp with timezone type with SQLAlchemy if supported by the database. Otherwise, the datetimes will be stored as timezone unaware timestamps local to the original timezone.

Create an in-memory SQLite database.

>>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite://', echo=False) 

Create a table from scratch with 3 rows.

>>> df = pd.DataFrame('name' : ['User 1', 'User 2', 'User 3']>) >>> df name 0 User 1 1 User 2 2 User 3 
>>> df.to_sql('users', con=engine) 3 >>> from sqlalchemy import text >>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 1'), (1, 'User 2'), (2, 'User 3')] 

An sqlalchemy.engine.Connection can also be passed to con :

>>> with engine.begin() as connection: . df1 = pd.DataFrame('name' : ['User 4', 'User 5']>) . df1.to_sql('users', con=connection, if_exists='append') 2 

This is allowed to support operations that require that the same DBAPI connection is used for the entire operation.

>>> df2 = pd.DataFrame('name' : ['User 6', 'User 7']>) >>> df2.to_sql('users', con=engine, if_exists='append') 2 >>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 1'), (1, 'User 2'), (2, 'User 3'), (0, 'User 4'), (1, 'User 5'), (0, 'User 6'), (1, 'User 7')] 

Overwrite the table with just df2 .

>>> df2.to_sql('users', con=engine, if_exists='replace', . index_label='id') 2 >>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM users")).fetchall() [(0, 'User 6'), (1, 'User 7')] 

Use method to define a callable insertion method to do nothing if there’s a primary key conflict on a table in a PostgreSQL database.

>>> from sqlalchemy.dialects.postgresql import insert >>> def insert_on_conflict_nothing(table, conn, keys, data_iter): . # "a" is the primary key in "conflict_table" . data = [dict(zip(keys, row)) for row in data_iter] . stmt = insert(table.table).values(data).on_conflict_do_nothing(index_elements=["a"]) . result = conn.execute(stmt) . return result.rowcount >>> df_conflict.to_sql("conflict_table", conn, if_exists="append", method=insert_on_conflict_nothing) 0 

For MySQL, a callable to update columns b and c if there’s a conflict on a primary key.

>>> from sqlalchemy.dialects.mysql import insert >>> def insert_on_conflict_update(table, conn, keys, data_iter): . # update columns "b" and "c" on primary key conflict . data = [dict(zip(keys, row)) for row in data_iter] . stmt = ( . insert(table.table) . .values(data) . ) . stmt = stmt.on_duplicate_key_update(b=stmt.inserted.b, c=stmt.inserted.c) . result = conn.execute(stmt) . return result.rowcount >>> df_conflict.to_sql("conflict_table", conn, if_exists="append", method=insert_on_conflict_update) 2 

Specify the dtype (especially useful for integers with missing values). Notice that while pandas is forced to store the data as floating point, the database supports nullable integers. When fetching the data with Python, we get back integer scalars.

>>> df = pd.DataFrame("A": [1, None, 2]>) >>> df A 0 1.0 1 NaN 2 2.0 
>>> from sqlalchemy.types import Integer >>> df.to_sql('integers', con=engine, index=False, . dtype="A": Integer()>) 3 
>>> with engine.connect() as conn: . conn.execute(text("SELECT * FROM integers")).fetchall() [(1,), (None,), (2,)] 

Источник

Оцените статью