- Saved searches
- Use saved searches to filter your results more quickly
- License
- Licenses found
- jakevdp/PythonDataScienceHandbook
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- Python Data Science Handbook
- Table of Contents¶
- Preface¶
- 1. IPython: Beyond Normal Python¶
- 2. Introduction to NumPy¶
- 3. Data Manipulation with Pandas¶
- 4. Visualization with Matplotlib¶
- 5. Machine Learning¶
- Appendix: Figure Code¶
- Python Data Science Handbook
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
Python Data Science Handbook: full text in Jupyter Notebooks
License
MIT, Unknown licenses found
Licenses found
jakevdp/PythonDataScienceHandbook
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
Python Data Science Handbook
This repository contains the entire Python Data Science Handbook, in the form of (free!) Jupyter notebooks.
- Read the book in its entirety online at https://jakevdp.github.io/PythonDataScienceHandbook/
- Run the code using the Jupyter notebooks available in this repository’s notebooks directory.
- Launch executable versions of these notebooks using Google Colab:
- Launch a live notebook server with these notebooks using binder:
- Buy the printed book through O’Reilly Media
The book was written and tested with Python 3.5, though other Python versions (including Python 2.7) should work in nearly all cases.
The book introduces the core libraries essential for working with data in Python: particularly IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and related packages. Familiarity with Python as a language is assumed; if you need a quick introduction to the language itself, see the free companion project, A Whirlwind Tour of Python: it’s a fast-paced introduction to the Python language aimed at researchers and scientists.
See Index.ipynb for an index of the notebooks available to accompany the text.
The code in the book was tested with Python 3.5, though most (but not all) will also work correctly with Python 2.7 and other older Python versions.
The packages I used to run the code in the book are listed in requirements.txt (Note that some of these exact version numbers may not be available on your platform: you may have to tweak them for your own use). To install the requirements using conda, run the following at the command-line:
$ conda install --file requirements.txt
To create a stand-alone environment named PDSH with Python 3.5 and all the required package versions, run the following:
$ conda create -n PDSH python=3.5 --file requirements.txt
You can read more about using conda environments in the Managing Environments section of the conda documentation.
The code in this repository, including all code samples in the notebooks listed above, is released under the MIT license. Read more at the Open Source Initiative.
The text content of the book is released under the CC-BY-NC-ND license. Read more at Creative Commons.
About
Python Data Science Handbook: full text in Jupyter Notebooks
Python Data Science Handbook
This website contains the full text of the Python Data Science Handbook by Jake VanderPlas; the content is available on GitHub in the form of Jupyter notebooks.
The text is released under the CC-BY-NC-ND license, and code is released under the MIT license.
If you find this content useful, please consider supporting the work by buying the book!
Table of Contents¶
Preface¶
1. IPython: Beyond Normal Python¶
2. Introduction to NumPy¶
3. Data Manipulation with Pandas¶
- Introducing Pandas Objects
- Data Indexing and Selection
- Operating on Data in Pandas
- Handling Missing Data
- Hierarchical Indexing
- Combining Datasets: Concat and Append
- Combining Datasets: Merge and Join
- Aggregation and Grouping
- Pivot Tables
- Vectorized String Operations
- Working with Time Series
- High-Performance Pandas: eval() and query()
- Further Resources
4. Visualization with Matplotlib¶
- Simple Line Plots
- Simple Scatter Plots
- Visualizing Errors
- Density and Contour Plots
- Histograms, Binnings, and Density
- Customizing Plot Legends
- Customizing Colorbars
- Multiple Subplots
- Text and Annotation
- Customizing Ticks
- Customizing Matplotlib: Configurations and Stylesheets
- Three-Dimensional Plotting in Matplotlib
- Geographic Data with Basemap
- Visualization with Seaborn
- Further Resources
5. Machine Learning¶
- What Is Machine Learning?
- Introducing Scikit-Learn
- Hyperparameters and Model Validation
- Feature Engineering
- In Depth: Naive Bayes Classification
- In Depth: Linear Regression
- In-Depth: Support Vector Machines
- In-Depth: Decision Trees and Random Forests
- In Depth: Principal Component Analysis
- In-Depth: Manifold Learning
- In Depth: k-Means Clustering
- In Depth: Gaussian Mixture Models
- In-Depth: Kernel Density Estimation
- Application: A Face Detection Pipeline
- Further Machine Learning Resources
Appendix: Figure Code¶
© 2012-2017 Jake VanderPlas, license unless otherwise noted. Generated by Pelican.
Python Data Science Handbook
Publication date 2022-07-02 Usage Attribution-NonCommercial-NoDerivs 4.0 International Topics Python, Data Science, programming, coding, book, python books Collection folkscanomy_computer_inbox; folkscanomy_computer; folkscanomy; additional_collections Language English
For many researchers, Python is a first-class tool mainly because of its libraries for storing, manipulating, and gaining insight from data. Several resources exist for individual pieces of this data science stack, but only with the Python Data Science Handbook do you get them all-IPython. NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.
Working scientists and data crunchers familiar with reading and writing Python code will find this comprehensive desk reference ideal for tackling day-to-day issues: manipulating, transforming, and cleaning data; visualizing different types of data; and using data to build statistical or machine learning models. Quite simply, this is the must-have reference for scientific computing in Python.
Addeddate 2022-07-01 18:49:40 Identifier python-data-science-handbook.pdf Identifier-ark ark:/13960/s22v8fp04sg Ocr tesseract 5.1.0-1-ge935 Ocr_detected_lang en Ocr_detected_lang_conf 1.0000 Ocr_detected_script Latin Ocr_detected_script_conf 1.0000 Ocr_module_version 0.0.16 Ocr_parameters -l eng Page_number_confidence 96.71 Ppi 300 Scanner Internet Archive HTML5 Uploader 1.6.4