Python pandas dataframe sum

Python | Pandas dataframe.sum()

Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric python packages. Pandas is one of those packages and makes importing and analyzing data much easier.

Pandas dataframe.sum() function return the sum of the values for the requested axis. If the input is index axis then it adds all the values in a column and repeats the same for all the columns and returns a series containing the sum of all the values in each column. It also provides support to skip the missing values in the dataframe while calculating the sum in the dataframe.

Syntax: DataFrame.sum(axis=None, skipna=None, level=None, numeric_only=None, min_count=0, **kwargs)

Parameters :
axis :
skipna : Exclude NA/null values when computing the result.
level : If the axis is a MultiIndex (hierarchical), count along a particular level, collapsing into a Series
numeric_only : Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric data. Not implemented for Series.
min_count : The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Returns : sum : Series or DataFrame (if level specified)

For link to the CSV file used in the code, click here

Example #1: Use sum() function to find the sum of all the values over the index axis.

Источник

pandas.DataFrame.sum#

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum .

Parameters : axis

Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Exclude NA/null values when computing the result.

numeric_only bool, default False

Include only float, int, boolean columns. Not implemented for Series.

min_count int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Additional keyword arguments to be passed to the function.

Returns : Series or scalar

Return the index of the minimum.

Return the index of the maximum.

Return the sum over the requested axis.

Return the minimum over the requested axis.

Return the maximum over the requested axis.

Return the index of the minimum over the requested axis.

Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([ . ['warm', 'warm', 'cold', 'cold'], . ['dog', 'falcon', 'fish', 'spider']], . names=['blooded', 'animal']) >>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx) >>> s blooded animal warm dog 4 falcon 2 cold fish 0 spider 8 Name: legs, dtype: int64 

By default, the sum of an empty or all-NA Series is 0 .

>>> pd.Series([], dtype="float64").sum() # min_count=0 is the default 0.0 

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1 .

>>> pd.Series([], dtype="float64").sum(min_count=1) nan 

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum(min_count=1) nan 

Источник

Читайте также:  Ширина блока

pandas.DataFrame.sum#

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum .

Parameters : axis

Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Exclude NA/null values when computing the result.

numeric_only bool, default False

Include only float, int, boolean columns. Not implemented for Series.

min_count int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Additional keyword arguments to be passed to the function.

Returns : Series or scalar

Return the index of the minimum.

Return the index of the maximum.

Return the sum over the requested axis.

Return the minimum over the requested axis.

Return the maximum over the requested axis.

Return the index of the minimum over the requested axis.

Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([ . ['warm', 'warm', 'cold', 'cold'], . ['dog', 'falcon', 'fish', 'spider']], . names=['blooded', 'animal']) >>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx) >>> s blooded animal warm dog 4 falcon 2 cold fish 0 spider 8 Name: legs, dtype: int64 

By default, the sum of an empty or all-NA Series is 0 .

>>> pd.Series([], dtype="float64").sum() # min_count=0 is the default 0.0 

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1 .

>>> pd.Series([], dtype="float64").sum(min_count=1) nan 

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum(min_count=1) nan 

Источник

pandas.DataFrame.sum#

Return the sum of the values over the requested axis.

This is equivalent to the method numpy.sum .

Parameters axis

Axis for the function to be applied on. For Series this parameter is unused and defaults to 0.

For DataFrames, specifying axis=None will apply the aggregation across both axes.

Exclude NA/null values when computing the result.

numeric_only bool, default False

Include only float, int, boolean columns. Not implemented for Series.

min_count int, default 0

The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA.

Additional keyword arguments to be passed to the function.

Return the index of the minimum.

Return the index of the maximum.

Return the sum over the requested axis.

Return the minimum over the requested axis.

Return the maximum over the requested axis.

Return the index of the minimum over the requested axis.

Читайте также:  white-space

Return the index of the maximum over the requested axis.

>>> idx = pd.MultiIndex.from_arrays([ . ['warm', 'warm', 'cold', 'cold'], . ['dog', 'falcon', 'fish', 'spider']], . names=['blooded', 'animal']) >>> s = pd.Series([4, 2, 0, 8], name='legs', index=idx) >>> s blooded animal warm dog 4 falcon 2 cold fish 0 spider 8 Name: legs, dtype: int64 

By default, the sum of an empty or all-NA Series is 0 .

>>> pd.Series([], dtype="float64").sum() # min_count=0 is the default 0.0 

This can be controlled with the min_count parameter. For example, if you’d like the sum of an empty series to be NaN, pass min_count=1 .

>>> pd.Series([], dtype="float64").sum(min_count=1) nan 

Thanks to the skipna parameter, min_count handles all-NA and empty series identically.

>>> pd.Series([np.nan]).sum(min_count=1) nan 

Источник

Pandas Sum() – Sum each Column and Row in Pandas DataFrame

pandas sum

Pandas sum(): We will see in this tutorial how to use the sum() function for a column or row in a Pandas dataframe.

Introduction

A pandas dataframe is a two-dimensional tabular data structure that can be modified in size with labeled axes that are commonly referred to as row and column labels, with different arithmetic operations aligned with the row and column labels.

The Pandas library, available on python, allows to import data and to make quick analysis on loaded data.

In this tutorial, we will see how to use the sum() function present in the pandas library. This pandas function allows to return the sum of the values according to the axis requested in parameter. We will see the following points:

  • Use the sum() function to sum the values on the index axis (the rows)
  • Use the sum() function to sum the values on the columns axis
  • Sum the values with a multi-level index
  • Sum the values on a Series type

To illustrate these different points, we will use the following pandas dataframe:

import pandas as pd data = (['January', 'Monday', 10000, 30000], ['January', 'Friday', 5000, 20000], ['February', 'Monday', 1000000, 2000000], ['February', 'Friday', 2000000, 5000000], ['February', 'Sunday', 5000000, 10000000], ['March', 'Tuesday', 4000000, 8000000]) df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B']) df = df.set_index(['Month', 'Day_Week'])
Income_A Income_B Month Day_Week January Monday 10000 30000 Friday 5000 20000 February Monday 1000000 2000000 Friday 2000000 5000000 Sunday 5000000 10000000 March Tuesday 4000000 8000000

This dataframe contains the different incomes generated per month and per day.

Pandas Dataframe sum() function

Pandas sum() Syntax

The sum() function is used to sum the values on a given axis. Its syntax is the following:

# Sum() function DataFrame.sum(axis = None, skipna = None, level = None, numeric_only = None, min_count = 0, ** kwargs)

The function can take 6 parameters:

Name Description Type Default Value Required
axis The axis to apply the function ( 0=index,1=columns) Yes
skipna Exclude NA / NULL values True No
level If the axis is a MultiIndex (hierarchical), count along a particular level, reducing to a series. int or level name None No
numeric_only Include only float, int, boolean columns. If none, will try to use everything, then use only numeric data. Not implemented for the series. Boolean True No
min_count The required number of valid values to perform the operation. If fewer than min_count non-NA values are present the result will be NA. int 0 No
** kwargs Additional arguments to be passed to the function. No

Sum each Column in Pandas DataFrame

In order to sum each column of the DataFrame, you can use the axis parameter in this way:

# Sum each column df.sum(axis=0)

You can apply this code to our previously created dataframe:

import pandas as pd data = (['amiradata', 'Monday', 10000, 30000], ['amiradata', 'Friday', 5000, 20000], ['google', 'Monday', 1000000, 2000000], ['google', 'Friday', 2000000, 5000000], ['google', 'Sunday', 5000000, 10000000], ['linkedin', 'Tuesday', 4000000, 8000000]) df = pd.DataFrame(data, columns=['Website', 'Day', 'Nb_Users', 'Nb_Pageviews']) df = df.set_index(['Website', 'Day']) print(df.sum(axis=0))
Result : Income_A 12015000 Income_B 25050000 dtype: int64

We obtain the sum of the income A and the sum of the income B on the last quarter.

Sum each Row in Pandas DataFrame

In order to sum each row of the DataFrame, you can use the axis=1 as follows:

You can apply this code to our previously created dataframe:

import pandas as pd data = (['January', 'Monday', 10000, 30000], ['January', 'Friday', 5000, 20000], ['February', 'Monday', 1000000, 2000000], ['February', 'Friday', 2000000, 5000000], ['February', 'Sunday', 5000000, 10000000], ['March', 'Tuesday', 4000000, 8000000]) df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B']) df = df.set_index(['Month', 'Day_Week']) print(df.sum(axis=1))
Result: Month Day_Week January Monday 40000 Friday 25000 February Monday 3000000 Friday 7000000 Sunday 15000000 March Tuesday 12000000 dtype: int64

In our example, this allows us to sum the income A and B for each row.

Multi Level Index Sum

If your dataframe has a multi-level index, you can tell pandas which index you want to sum across.

Our example dataframe contains 2 levels. To sum according to the first level, you can use this:

import pandas as pd data = (['January', 'Monday', 10000, 30000], ['January', 'Friday', 5000, 20000], ['February', 'Monday', 1000000, 2000000], ['February', 'Friday', 2000000, 5000000], ['February', 'Sunday', 5000000, 10000000], ['March', 'Tuesday', 4000000, 8000000]) df = pd.DataFrame(data, columns=['Month', 'Day_Week', 'Income_A', 'Income_B']) df = df.set_index(['Month', 'Day_Week']) print(df.sum(level=0))
Result: Income_A Income_B Month January 15000 50000 February 8000000 17000000 March 4000000 8000000

To sum from the second level, you can do this:

# Multi Level Index Sum df.sum(level=1)
Result: Income_A Income_B Day_Week Monday 1010000 2030000 Friday 2005000 5020000 Sunday 5000000 10000000 Tuesday 4000000 8000000

Summing a Series

You can also use the pandas sum() function on a series :

#Summing a Series df['Income_A'].sum()

Conclusion

In this tutorial, we have how to simply use the sum() function of the pandas library. This function is very useful to quickly analyze the data and make quick calculations on the columns or rows of our dataframe.

If you have any questions about its use, don’t hesitate to ask me in comments, I’ll be happy to answer them.

See you soon for new tutorials.

I’m a data scientist. Passionate about new technologies and programming I created this website mainly for people who want to learn more about data science and programming 🙂

Источник

Оцените статью