Python csv to postgres

Python and psycopg2 for CSV bulk upload in PostgreSQL – with examples…

In a previous post, I explored using both the COPY command and the CAST ( ) function together in order to upload an entire CSV file’s data into a PostgreSQL database table. This post is a continuation, of sorts. However, I will use Python and the psycopg2 library, in a simple script, to handle these uploads instead of SQL. While I am still a novice with operations such as these, I feel that writing about it is a fantastic way to learn and obtain valuable feedback…

Note: All data, names or naming found within the database presented in this post, are strictly used for practice, learning, instruction, and testing purposes. It by no means depicts actual data belonging to or being used by any party or organization.

Self-Promotion:

If you enjoy the content written here, by all means, share this blog and your favorite post(s) with others who may benefit from or like it as well. Since coffee is my favorite drink, you can even buy me one if you would like!

Studied Resources

I found valuable information and learning from the below resources – which I borrowed heavily from for this post – so be sure and visit them as well:

To recap, I have this staging table where I initially ingest walking stats from a CSV.

walking_stats => \d stat_staging;
Table «public.stat_staging»
Column | Type | Collation | Nullable | Default
—————+——+————+———-+———
day_walked | text | | |
cal_burned | text | | |
miles_walked | text | | |
duration | text | | |
mph | text | | |
shoe_id | text | | |

Presently, table ‘stat_staging’ is empty:

I also have a Python file named db. py with this code:

import psycopg2 as pg
import csv

file = r ‘/path/to/feb_2019_hiking_stats.csv’
sql_insert = «»»INSERT INTO stat_staging(day_walked, cal_burned, miles_walked,
duration, mph, shoe_id)
VALUES(%s, %s, %s, %s, %s, %s)»»»
try :
conn = pg. connect ( user = «my_user» ,
password = «my_password» ,
host = «127.0.0.1» ,
port = «5432» ,
database = «walking_stats» )
cursor = conn. cursor ( )
with open ( file , ‘r’ ) as f:
reader = csv . reader ( f )
next ( reader ) # This skips the 1st row which is the header.
for record in reader:
cursor. execute ( sql_insert , record )
conn. commit ( )
except ( Exception , pg. Error ) as e:
print ( e )
finally :
if ( conn ) :
cursor. close ( )
conn. close ( )
print ( «Connection closed.» )

Running the db. py script via the terminal from my virtual environment, we can see the below output:

‘Connection closed.’ is output due to execution having reached the finally portion of the try / except / finally block in the script. Likely not the most graceful or informative way to provide information, in which I am definitely seeking better practices, as I move forward with my learning.

Читайте также:  Html закладки для браузера

Basically, the execute ( ) method accepts a query and some variables that are bound to the query via either the %s or % ( name ) s placeholders. As stated in the documentation – linked above – these parameters can be “a sequence or mapping”.

Next, in a psql session, I’ll query table ‘stat_staging’, verifying the records were inserted:

walking_stats => SELECT * FROM stat_staging;
day_walked | cal_burned | miles_walked | duration | mph | shoe_id
————+————+—————+————+——+———
2019 -02-01 | 243.6 | 2.46 | 00: 44 : 48 | 3.3 | 4
2019 -02-03 | 285.5 | 2.91 | 00: 52 : 29 | 3.3 | 4
2019 -02-04 | 237.6 | 2.44 | 00: 43 : 41 | 3.4 | 4
2019 -02-05 | 242.3 | 2.41 | 00: 44 : 33 | 3.2 | 4
2019 -02-06 | 204.9 | 2.03 | 00: 37 : 40 | 3.2 | 4
2019 -02-07 | 183.5 | 1.80 | 00: 33 : 44 | 3.2 | 4
2019 -02-08 | 177.5 | 1.71 | 00: 32 : 38 | 3.1 | 4
2019 -02- 10 | 244.8 | 2.47 | 00: 45 :00 | 3.3 | 4
2019 -02- 11 | 232.8 | 2.33 | 00: 42 : 48 | 3.3 | 4
2019 -02- 12 | 241.8 | 2.39 | 00: 44 : 27 | 3.2 | 4
2019 -02- 13 | 235.2 | 2.34 | 00: 43 : 15 | 3.2 | 4
2019 -02- 14 | 245.5 | 2.45 | 00: 45 :08 | 3.3 | 4
2019 -02- 15 | 204.8 | 2.03 | 00: 37 : 38 | 3.2 | 4
2019 -02- 17 | 244.9 | 2.46 | 00: 45 :01 | 3.3 | 4
2019 -02- 18 | 246.0 | 2.50 | 00: 45 : 14 | 3.3 | 4
2019 -02- 18 | 201.9 | 2.07 | 00: 37 :07 | 3.3 | 4
2019 -02- 20 | 201.8 | 2.05 | 00: 37 :06 | 3.3 | 4
2019 -02- 21 | 179.5 | 1.80 | 00: 33 :00 | 3.3 | 4
2019 -02- 22 | 164.0 | 1.64 | 00: 30 :09 | 3.3 | 4
2019 -02- 24 | 241.3 | 2.40 | 00: 44 : 22 | 3.2 | 4
2019 -02- 25 | 247.2 | 2.47 | 00: 45 : 27 | 3.3 | 4
2019 -02- 26 | 238.9 | 2.35 | 00: 43 : 55 | 3.2 | 4
2019 -02- 27 | 244.1 | 2.46 | 00: 44 : 52 | 3.3 | 4
2019 -02- 28 | 246.2 | 2.46 | 00: 45 : 16 | 3.3 | 4
( 24 rows )

Where table ‘stat_staging’ was previously empty, it now has 24 rows of data from the successful INSERT operation in the db. py script. Although this CSV file is relatively small, is using the execute ( ) method like this the most efficient way? Could looping and performing the INSERT for each record, have a performance impact for that many individual writes?

All that being said, psycopg2 does have an executemany ( ) method? Is it better suited for tasks like this? That will be the focus of an upcoming blog post where I will at least explore its use. Be sure and check back in to read it when it is published as well!

Like what you have read? See anything incorrect? Please comment below and thanks for reading.

Explore the official PostgreSQL 11 On-line Documentation for more information.

A Call To Action!

Thank you for taking the time to read this post. I truly hope you discovered something interesting and enlightening. Please share your findings here, with someone else you know who would get the same value out of it as well.

Читайте также:  Тег IFRAME, атрибут frameborder

Visit the Portfolio-Projects page to see blog post/technical writing I have completed for clients.

To receive email notifications (Never Spam) from this blog (“Digital Owl’s Prose”) for the latest blog posts as they are published, please subscribe (of your own volition) by clicking the ‘Click To Subscribe!’ button in the sidebar on the homepage! (Feel free at any time to review the Digital Owl’s Prose Privacy Policy Page for any questions you may have about: email updates, opt-in, opt-out, contact forms, etc…)

Be sure and visit the “Best Of” page for a collection of my best blog posts.

Josh Otwell has a passion to study and grow as a SQL Developer and blogger. Other favorite activities find him with his nose buried in a good book, article, or the Linux command line. Among those, he shares a love of tabletop RPG games, reading fantasy novels, and spending time with his wife and two daughters.

Disclaimer: The examples presented in this post are hypothetical ideas of how to achieve similar types of results. They are not the utmost best solution(s). The majority, if not all, of the examples provided, is performed on a personal development/learning workstation-environment and should not be considered production quality or ready. Your particular goals and needs may vary. Use those practices that best benefit your needs and goals. Opinions are my own.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Python Script for Adding Entries to an Already Existing Postgre DB

License

audetteio/CSV-to-Postgre

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Python Script for Adding Entries to an Already Existing PostgreSQL DB

pip3 isntall -r requirements.txt 

Open config.yml in your prefered text editor and fill in with specific db information.

database: host: localhost # enter the ip of machine runnign db user: postgres # enter username that has correct role for db passwd: password # enter password for db db: timescale # enter name of db csv is being loaded to 
  1. Columns must match db type for that column
  2. When populating a db, order of table population matters
  3. If there are issues installing PyYAML, try downgrading pip
sudo -H pip3 install pip==8.1.1 

About

Python Script for Adding Entries to an Already Existing Postgre DB

Читайте также:  Wordpress index php blog

Источник

Как экспортировать данные из CSV файла в таблицу PostgreSQL при помощи python?

Как экспортировать данные из файла CSV в таблицу PostgreSQL так, чтобы они заменялись, а не добавлялись к уже имеющимся?
Сейчас приходится вначале очищать таблицу, а потом только добавлять данные при помощи следующего кода:

import psycopg2 conn = psycopg2.connect("host=localhost dbname=postgres user=postgres password=1111111") cur = conn.cursor() with open('file.csv', 'r', encoding="utf-8") as f: next(f) cur.copy_from(f, 'table_name', sep=',') conn.commit()

Но данные добавляются, а нужно заменять.

Простой 3 комментария

Что значит «чтобы они заменялись»?

Если новые данные из csv приходят с некоторыми измененными полями, должен быть способ сопоставить: вот эта строчка в базе уже была, а теперь в ней изменились поля f1/f2/f3; а вот это строчка новая полностью — надо добавить. В простейшем случае это будет ключ — набор (неменяющихся) полей, по которым можно идентифицировать строчку.

galaxy, Можеть быть не правильно выразился, извиняюсь.

Мне надо чтобы все данные из таблицы очищались, а потом вставлялись новые.

Сейчас я делаю это руками.

вначале DELETE FROM table_name;

import psycopg2 conn = psycopg2.connect("host=localhost dbname=postgres user=postgres password=1111111") cur = conn.cursor() with open('file.csv', 'r', encoding="utf-8") as f: next(f) cur.copy_from(f, 'table_name', sep=',') conn.commit()

А так как это 2 разных действия, то времени уходит уйма. К тому же еще данные обновляются каждые 3 минуты.

evgeniy1215, да ну, какая уйма, вы что. Если надо полностью перезаписать таблицу, это самый правильный способ (с маленькой поправкой: быстрее будет делать не DELETE, а TRUNCATE table_name; . Запрос TRUNCATE можно также из python отправлять

Vindicar

Без знания структуры данных и структуры таблицы тут мало что понять можно.
Но вообще, вместо тупого копирования из csv через copy_from*(), нужно:
1. Читай файл самостоятельно. Используй модуль csv, он умеет удобно парсить этот формат, в том числе пропускать строку заголовков.
2. Выбери в своих данных первичный ключ, чтобы понять, когда такая строка уже есть в базе, а когда её ещё нет. При создании таблицы укажи первичным ключом соотв. столбец.
3. вставляй данные через запрос вида INSERT . ON CONFLICT UPDATE. Точный синтаксис нужно посмотреть в документации на постгрес, обычно это называется UPSERT.

Проблема с copy_from() в том, что у тебя таблица БД должна иметь тот же набор столбцов и в том же порядке, что и CSV, хотя это ограничение совершенно необязательно.

У меня таблица в PostgreSQL и данные в файле идентичны. При парсинге я уже привожу файл CSV к необходимому виду.
Вот структура таблицы:

612f68acbcd6c286435608.png

Vindicar

Войдите, чтобы написать ответ

Как реализовать расчет определенных индикаторов торговли в реальном времени?

Источник

Оцените статью