Python save file to disk

Downloading and saving the Shakespeare zip with requests

How to download and save a file to disk. And how to properly resolve a file’s full path, including its directory.

The problem

Here’s the problem we’re trying to solve – if you’re doing this as homework, see the full info for this exercise:

Write the Python commands to download the file from the following URL:

You don’t need to unzip it, just worry about downloading it and saving it to disk.

When you run b.py from the command-line:

0004-shakefiles $ python b.py
Downloading: http://stash.compciv.org/scrapespeare/matty.shakespeare.tar.gz Writing file: tempdata/matty.shakespeare.tar.gz

How to write a file to disk

By now, we already know how to download a file from the Web with the Requests library:

import requests resp = requests.get('http://www.example.com') thetext = resp.text 

However, the act of downloading a file programatically doesn’t mean that that file has been saved (i.e. written) to our hard drive. That is its own step.

Before we can write a file to disk, we must open a new file stream object with the open() function. This is similar to what we have to do when reading an existing file, but take notice of the second argument:

>>> newfile = open("somenewfile.txt", "w") 

That «w» string tells the open() function that we don’t want to read from this file object. As you can imagine, «w» stands for write. You can think of the first argument, e.g. the string «somenewfile.txt» as us naming the file to be created.

How to accidentally destroy your work

It’s worth stopping for a moment and considering: what happens when we try to open an existing file path to write to?

The answer: if you give the open() function an existing filename with the intention of writing to it – whatever existed at that filename is permanently erased.

There is no Recycle Bin at the programming level. The Python interpreter just assumes you know what you’re doing, and won’t even throw an error or warning. It will just wipe out the existing file before writing to it.

So consider this your warning to be incredibly mindful whenever you want to write a file to disk.

Using the file’s write() function to write data to disk

OK, going back to that step in which we opened a file, at the path somenewfile.txt :

>>> newfile = open("somenewfile.txt", "w") 

What is that newfile variable pointing to? Use the type() function to find out:

>>> type(newfile) _io.TextIOWrapper 

For simplicity’s sake, I’m just going to refer to it as a «file object» (or file stream object). Let’s use the Tab autocomplete to get its list of methods:

>>> newfile. # hit Tab here newfile.buffer newfile.isatty newfile.readlines newfile.close newfile.line_buffering newfile.seek newfile.closed newfile.mode newfile.seekable newfile.detach newfile.name newfile.tell newfile.encoding newfile.newlines newfile.truncate newfile.errors newfile.read newfile.writable newfile.fileno newfile.readable newfile.write newfile.flush newfile.readline newfile.writelines 

You can guess that the write function is what we want. But this object also has a read function…That’s because it’s a file object, and file objects can be written to or read from. It doesn’t matter how we called the open() function.

Читайте также:  Write csv file with java

That said…go ahead and try to read() from newfile :

>>> newfile.read() UnsupportedOperation: not readable 

There is how Python reminds us that the file is not meant to be read from, since we called open() with the «w» argument.

Now that we have that cleared up, let’s just write to the file. You can pass in a string object as the argument, and call write() as many times as you want to:

>>> newfile.write("hello") 5 >>> newfile.write("world!") 6 

The write() function returns the number of characters that was written to the file. After we’ve finished writing to the file, we call the close() function:

Now switch to your text editor and look for the file you just created. If you’ve been following this example, the filename we used is: somenewfile.txt

This is what the contents of that file should look like:

Or, alternatively, you could use Python to re-open the file and then read it:

(Note: when just opening a file in order to read it, the second argument of the open() function is optional. By default, open() assumes you want to read from the given filepath. I include «r» here just to be explicit)

>>> myfile = open("somenewfile.txt", "r") >>> txt = myfile.read() >>> print(txt) helloworld >>> myfile.close() 

How to write newline characters to a file

Notice that helloworld is not on two different lines. The write() method doesn’t automatically add newline characters to the argument we pass in. If we do want to have write() add newlines, we have to explicitly add the newline character: \n

Let’s try it now. And let’s also deliberately overwrite our old file (at the path, somenewfile.txt ):

>>> newfile = open("somenewfile.txt", "w") >>> newfile.write("hello\n") >>> newfile.write("world\n") >>> newfile.close() 

If you read from somenewfile.txt , you’ll see that its contents are:

How to download a text file and write it to disk

I’ve written a separate guide about writing files, but this section should contain all you need to know for this particular lesson.

Let’s go back to requests.get() , from the beginning:

>>> import requests >>> resp = requests.get("http://www.example.com") >>> exampletxt = resp.text >>> type(exampletxt) str 

If the download succeeded, the exampletxt variable contains the raw HTML of the page at http://www.example.com, and that raw HTML is just a String object.

Читайте также:  Java вхождение в массиве

Which means we can pass it into a file object’s write() method just as we wrote the strings «hello» and «world» to the file:

>>> outfile = open("example.com.html", "w") >>> outfile.write(exampletxt) 1270 >>> outfile.close() 

If you use your text editor to open example.com.html (wherever directory you saved it to), the file should contain the raw HTML of www.example.com.

How to write a binary file

Not all files are text. Rather than explain in detail, for now, I will just show how the open() function needs to be called when writing a non-text file to disk – it requires a change to the second argument:

>>> zfile = open("mynewzipfile.zip", "wb") 

Think of that «wb» as standing for: «write bytes».

Check the type() of zfile to see what it points to:

>>> type(zfile) _io.BufferedWriter 

Again, I think of this as a file object – but note that it is different from the previous example involving a text file, in which the object had a type of: _io.TextIOWrapper

Whether it is a binary or text file, the same read() and write() methods exist.

How to access the contents of a downloaded binary file

But typically, we don’t manually type in the bytes that we want to write to a file. Let’s go back to the requests.get() method, but this time, let’s download a zip file from the following path:

>>> import requests >>> zipurl = 'http://stash.compciv.org/scrapespeare/matty.shakespeare.tar.gz' >>> resp = requests.get(zipurl) 

The contents attribute of the Response object

So the response variable contains the result of the download from the given URL. This should take considerably longer (by a few seconds, at least) than downloading example.com because we’re downloading the entire works of Shakespeare.

However, its his (text) works all in a zip file, which itself is not a text file. It’s a binary file full of bytes, not string characters.

How the Requests library has been designed is that instead of using the text attribute, i.e.

– for binary files, we use the content attribute. This is just something you have to memorize and get used to. Again, use the type() method to see what kind of object resp.content actually is (it’s not a str , to hammer on this point):

>>> thedata = resp.content >>> type(thedata) bytes 

Downloading a binary file and writing it to disk

OK, all together: downloading a zip file and then saving it to disk:

import requests zipurl = 'http://stash.compciv.org/scrapespeare/matty.shakespeare.tar.gz' resp = requests.get(zipurl) zname = "matty.shakespeare.tar.gz" zfile = open(zname, 'wb') zfile.write(resp.content) zfile.close() 

Check to see if matty.shakespeare.tar.gz was actually saved to your computer at the given path. You can even double-click it to see if it unzips. Note that we did not programatically unzip the file. We simply downloaded and saved it to a path.

Using os.path.join to create a file pathname

This is going to seem exceedingly pedantic. On Mac OSX and Linux, the following file path:

– means that the somefile.zip file is inside the tempdata subdirectory.

However, in Windows, that path looks like this:

The differences between operating systems means that, just to be safe, it’s better to defer the naming of a file path to the join() function that is part of Python’s os.path module (which is automatically included if you ran import os ).

Читайте также:  Fix header top css

Here’s what that looks like:

>>> mydirname = 'tempdata' >>> myfilename = 'somefile.zip' >>> myfullfilename = os.path.join(mydirname, myfilename) >>> print(myfullfilename) tempdata/somefile.zip # note that this will be different on Windows machines 
fname = os.path.join("tempdata", "somefile.zip") 

Yes, that seems like a lot of code to generate the string of tempdata/somefile.zip . But besides being cross-platform compatible, it’s worth using this pattern because in real-world programming, paths can get fairly complicated (i.e. with deeply nested subdirectories). It’s just easier to use Python’s helper functions to deal with it, in the long run.

Revisiting our download-and-save code from the previous example, except using the join() method, and saving it to the tempdata directory (assuming that it’s been created):

import requests import os zipurl = 'http://stash.compciv.org/scrapespeare/matty.shakespeare.tar.gz' resp = requests.get(zipurl) # assuming the subdirectory tempdata has been created: zname = os.path.join('tempdata', "matty.shakespeare.tar.gz") zfile = open(zname, 'wb') zfile.write(resp.content) zfile.close() 

Источник

a full stack enthusiast interested in Javascript / Typescript / Kotlin / Python / React / React Native / (Android & iOS) native / Cloud

How to save a python object to disk and retrieve it later?

Objects are somethings which exists at run time, if we want to maintain its status across sessions, we need some intermediate technology such as serialization. With serialization you could flush the objects to the disk to make it persistent and deserialize it afterwards to convert it back from binary form to a runtime object.

1. About the procedures:

  • Serialization: Save an object to disk with a binary form
  • Deserialization: Retrieve an object from a file

Python comes with a handy library called pickle which tackles this situation nicely.

2. Serialization : Save an object to the disk

We need to use pickle.dump() method:

pickle.dump(your_object, file, pickle.HIGHEST_PROTOCOL)

You can use the handy with statement to write a object to the disk

with open(«super.file», «wb») as f:
pickle.dump(self, f, pickle.HIGHEST_PROTOCOL)

You should notice that the file I will write the object to is named super.file rather than super.txt to imply that this file will be binary based rather than plain text.

3. Deserialization : Convert the file to python object

We will use pickle.load() method.

and it is as simple as pickle.dump() :

As previous, an example with with statement:

with open(«super.file», «rb») as f:
dump = pickle.load(f)

# Now you can use the dump object as the original one
self.some_property = dump.some_property

4. End

With this pickle library under your belt, you could wrap it to make your own persistence solution in a decent manner.
Notice that this is just a simple intro to the pickle library. There are more methods for you to explorer. The official documents can be found here: Python 2.7 or Python 3.5.

Follow me (albertgao) on twitter, if you want to hear more about my interesting ideas.

Albert Gao

A IT guy for 10 years. A fullstack developer focused on open source tech. Now working on React / React native / Typescript / Node / GraphQL stack. A former employee of Alibaba Group now in New Zealand. MCSD of windows app via C#, a windows phone developer with several well-received apps.

Источник

Оцените статью