Питон получить размер файла

Get size of a file before downloading in Python

I’m downloading an entire directory from a web server. It works OK, but I can’t figure how to get the file size before download to compare if it was updated on the server or not. Can this be done as if I was downloading the file from a FTP server?

import urllib import re url = "http://www.someurl.com" # Download the page locally f = urllib.urlopen(url) html = f.read() f.close() f = open ("temp.htm", "w") f.write (html) f.close() # List only the .TXT / .ZIP files fnames = re.findall('^.* 

@Jon: thank for your quick answer. It works, but the filesize on the web server is slightly less than the filesize of the downloaded file. Examples:

Local Size Server Size 2.223.533 2.115.516 664.603 662.121 

Possibly. Can you run diff on it and see a difference? Also do you see the file size difference in the binary (.zip) files? Edit: This is where things like Etags comes in handy. The server will tell you when something changes, so you don't have to download the complete file to figure it out.

12 Answers 12

I have reproduced what you are seeing:

import urllib, os link = "http://python.org" print "opening url:", link site = urllib.urlopen(link) meta = site.info() print "Content-Length:", meta.getheaders("Content-Length")[0] f = open("out.txt", "r") print "File on disk:",len(f.read()) f.close() f = open("out.txt", "w") f.write(site.read()) site.close() f.close() f = open("out.txt", "r") print "File on disk after download:",len(f.read()) f.close() print "os.stat().st_size returns:", os.stat("out.txt").st_size 
opening url: http://python.org Content-Length: 16535 File on disk: 16535 File on disk after download: 16535 os.stat().st_size returns: 16861 

What am I doing wrong here? Is os.stat().st_size not returning the correct size?

Edit: OK, I figured out what the problem was:

import urllib, os link = "http://python.org" print "opening url:", link site = urllib.urlopen(link) meta = site.info() print "Content-Length:", meta.getheaders("Content-Length")[0] f = open("out.txt", "rb") print "File on disk:",len(f.read()) f.close() f = open("out.txt", "wb") f.write(site.read()) site.close() f.close() f = open("out.txt", "rb") print "File on disk after download:",len(f.read()) f.close() print "os.stat().st_size returns:", os.stat("out.txt").st_size 
$ python test.py opening url: http://python.org Content-Length: 16535 File on disk: 16535 File on disk after download: 16535 os.stat().st_size returns: 16535 

Make sure you are opening both files for binary read/write.

// open for binary write open(filename, "wb") // open for binary read open(filename, "rb") 

when you do site = urllib.urlopen(link) you have performed a file download , so it is not size before downloading its infact downloaded to buffer from where you are retrieving the content-length

@Ciastopiekarz I think it's when you attempt to read() that the file actually get downloaded in the buffer check this answer

Using the returned-urllib-object method info() , you can get various information on the retrieved document. Example of grabbing the current Google logo:

>>> import urllib >>> d = urllib.urlopen("http://www.google.co.uk/logos/olympics08_opening.gif") >>> print d.info() Content-Type: image/gif Last-Modified: Thu, 07 Aug 2008 16:20:19 GMT Expires: Sun, 17 Jan 2038 19:14:07 GMT Cache-Control: public Date: Fri, 08 Aug 2008 13:40:41 GMT Server: gws Content-Length: 20172 Connection: Close 

It's a dict, so to get the size of the file, you do urllibobject.info()['Content-Length']

And to get the size of the local file (for comparison), you can use the os.stat() command:

os.stat("/the/local/file.zip").st_size 

I have been using this solution, however I have hit an edge case where sometimes the content-length header is not defined. Can anyone explain why it wouldn't be consistently returned?

A requests-based solution using HEAD instead of GET (also prints HTTP headers):

#!/usr/bin/python # display size of a remote file without downloading from __future__ import print_function import sys import requests # number of bytes in a megabyte MBFACTOR = float(1 : <>'.format(k, v)) for k, v in response.headers.items()])) size = response.headers.get('content-length', 0) print(': MB'.format('FILE SIZE', int(size) / MBFACTOR)) 

Usage

$ python filesize-remote-url.py https://httpbin.org/image/jpeg . Content-Length : 35588 FILE SIZE (MB) : 0.03 MB 

Not every response will include a content-length --sometimes the response is generated using Transfer-Encoding: chunked , in which case there's no way to know without downloading.

The size of the file is sent as the Content-Length header. Here is how to get it with urllib:

>>> site = urllib.urlopen("http://python.org") >>> meta = site.info() >>> print meta.getheaders("Content-Length") ['16535'] >>> 

Also if the server you are connecting to supports it, look at Etags and the If-Modified-Since and If-None-Match headers.

Using these will take advantage of the webserver's caching rules and will return a 304 Not Modified status code if the content hasn't changed.

>>> import urllib.request >>> site = urllib.request.urlopen("http://python.org") >>> print("FileSize: ", site.length) 

For a python3 (tested on 3.5) approach I'd recommend:

with urlopen(file_url) as in_file, open(local_file_address, 'wb') as out_file: print(in_file.getheader('Content-Length')) out_file.write(response.read()) 

For anyone using Python 3 and looking for a quick solution using the requests package:

import requests response = requests.head( "https://website.com/yourfile.mp4", # Example file allow_redirects=True ) print(response.headers['Content-Length']) 

Note: Not all responses will have a Content-Length so your application will want to check to see if it exists.

if 'Content-Length' in response.headers: . # Do your stuff here 

Here is a much more safer way for Python 3:

import urllib.request site = urllib.request.urlopen("http://python.org") meta = site.info() meta.get('Content-Length') 

meta.get('Content-Length') will return the "Content-Length" header if exists. Otherwise it will be blank

Otherwise it will be None . Still +1 for your answer. Note: If you want it to return e.g. 0 if there's no Content-Length - do meta.get('Content-Length', 0) . Overall, my one-liner is urllib.request.urlopen(url).info().get('Content-Length', 0)

@PabloG Regarding the local/server filesize difference

Following is high-level illustrative explanation of why it may occur:

The size on disk sometimes is different from the actual size of the data. It depends on the underlying file-system and how it operates on data. As you may have seen in Windows when formatting a flash drive you are asked to provide 'block/cluster size' and it varies [512b - 8kb]. When a file is written on the disk, it is stored in a 'sort-of linked list' of disk blocks. When a certain block is used to store part of a file, no other file contents will be stored in the same blok, so even if the chunk is no occupuing the entire block space, the block is rendered unusable by other files.

Example: When the filesystem is divided on 512b blocks, and we need to store 600b file, two blocks will be occupied. The first block will be fully utilized, while the second block will have only 88b utilized and the remaining (512-88)b will be unusable resulting in 'file-size-on-disk' being 1024b. This is why Windows has different notations for 'file size' and 'size on disk'.

NOTE: There are different pros & cons that come with smaller/bigger FS block, so do a better research before playing with your filesystem.

Источник

3 способа получить размер файла в Python

Python stat() — это встроенный модуль OS , который имеет два метода, которые возвращают размер файла. Модуль OS в Python предоставляет функции для взаимодействия с операционной системой. Он входит в стандартные служебные модули Python. Модуль os обеспечивает портативный подход к использованию функций, зависящих от операционной системы.

Получение размера файла в Python

Чтобы получить размер файла в Python, мы можем использовать один из следующих трех способов:

Как получить размер файла в Python

Python os.path.getsize()

Функция os.path.getsize() возвращает размер в байтах. Вызовет OSError, если файл не существует или недоступен.

Сначала мы определили файл, а затем получили его размер с помощью функции os.path.getsize(), которая возвращает размер файла в байтах, а затем в последней строке мы преобразовали размер в байтах в размер в МБ.

Python os.stat()

Метод os.stat() в Python выполняет системный вызов stat() по указанному пути. Метод stat() используется для получения статуса указанного пути. Затем мы можем получить его атрибут st_size, чтобы получить размер файла в байтах. Метод stat() принимает в качестве аргумента имя файла и возвращает кортеж, содержащий информацию о файле.

Из вывода вы можете видеть, что мы получили кортеж, полный информации о файле. Затем мы получили доступ к определенному свойству, называемому st_size, чтобы получить размер файла, а затем преобразовать размер в МБ или мегабайты.

Если вы внимательно посмотрите на метод stat(), мы можем передать еще два параметра: dir_fd и follow_symlinks. Однако они не реализованы для macOS.

Python path.stat().st_mode

Функция Python path.stat() возвращает объект os.stat_result, содержащий информацию об этом пути, подобно os.stat(). Результат просматривается при каждом вызове этого метода.

Источник

How do I check file size in Python?

So is there a minute performance loss from using os.path.getsize as opposed to os.stat(file).st_size?

@wordsforthewise this is more of an issue if you also want to get other things about the file (modification time, type of file, e.g.) -- then you might as well get it all from a single system call via os.stat . Then the difference could run into a substantial number of microseconds 🙂

You need the st_size property of the object returned by os.stat . You can get it by either using pathlib (Python 3.4+):

>>> from pathlib import Path >>> Path('somefile.txt').stat() os.stat_result(st_mode=33188, st_ino=6419862, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=1564, st_atime=1584299303, st_mtime=1584299400, st_ctime=1584299400) >>> Path('somefile.txt').stat().st_size 1564 
>>> import os >>> os.stat('somefile.txt') os.stat_result(st_mode=33188, st_ino=6419862, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=1564, st_atime=1584299303, st_mtime=1584299400, st_ctime=1584299400) >>> os.stat('somefile.txt').st_size 1564 

@josch - yes, this is nice, for the "size on disk" you can multiply stat_result.st_blocks by the block size, but I'm still searching how to get it programmatically and cross-platform (not via tune2fs etc.)

@TomaszGandor now st_blocks is defined as "Number of 512-byte blocks allocated for file", so you don't have to get the block size.

The other answers work for real files, but if you need something that works for "file-like objects", try this:

# f is a file-like object. f.seek(0, os.SEEK_END) size = f.tell() 

It works for real files and StringIO's, in my limited testing. (Python 2.7.3.) The "file-like object" API isn't really a rigorous interface, of course, but the API documentation suggests that file-like objects should support seek() and tell() .

Another difference between this and os.stat() is that you can stat() a file even if you don't have permission to read it. Obviously the seek/tell approach won't work unless you have read permission.

At Jonathon's suggestion, here's a paranoid version. (The version above leaves the file pointer at the end of the file, so if you were to try to read from the file, you'd get zero bytes back!)

# f is a file-like object. old_file_position = f.tell() f.seek(0, os.SEEK_END) size = f.tell() f.seek(old_file_position, os.SEEK_SET) 

Источник

Читайте также:  Wordpress index php includes
Оцените статью