Python pivot table sort

Pandas pivot_table, sort values by columns

I am a new user to Pandas and I love it! I am trying to create a pivot table in Pandas. Once I have pivot table the way I want, I would like to rank the values by the columns. I’ve attached an image from Excel as it is easier to see in tabular format what I am trying to achieve. Link to image I’ve searched through stackoverflow but am having trouble finding an answer. I tried using .sort() but this doesn’t work. Any help will be appreciated. Thanks in advance

2 Answers 2

This should do what you are looking for:

In [1]: df = pd.DataFrame.from_dict([, , ]) In [2]: df_pivot = pd.pivot_table(df, index=['Country'], columns = ['Year'],values=['Value'], fill_value=0) In [3]: df_pivot Out [4]: Value Year 2012 2013 Country A 20 0 B 100 0 C 0 40 In [5]: df = df_pivot.reindex(df_pivot['Value'].sort_values(by=2012, ascending=False).index) Out [6]: Value Year 2012 2013 Country B 100 0 A 20 0 C 0 40 

Basically it gets the index of the sorted values and reindex the initial pivot table.

Thank you for your help! Your code worked perfectly. Apologies for not posting the code itself. I will do so next time.

Thanks for this! Just a small note: The last command, that is in [5] should be df_pivot.reindex , not df .

you can sort on more than one column in the pivot table. In my case, I have the probability of accident at postcode and probability of accident at address to sort descending and display the results in a heatmap.

pivot = df.pivot_table(index=['postcode'],values=['probability_at_address','probability_at_postcode'],aggfunc='mean').sort_values(by=['probability_at_address','probability_at_postcode'],ascending=False) fig,ax=plt.subplots(figsize=(10,20)) sns.heatmap(pivot,cmap="Blues",ax=ax) plt.show() 

Linked

Hot Network Questions

Subscribe to RSS

To subscribe to this RSS feed, copy and paste this URL into your RSS reader.

Site design / logo © 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA . rev 2023.7.27.43548

By clicking “Accept all cookies”, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy.

Источник

pandas.DataFrame.pivot_table#

DataFrame. pivot_table ( values = None , index = None , columns = None , aggfunc = ‘mean’ , fill_value = None , margins = False , dropna = True , margins_name = ‘All’ , observed = False , sort = True ) [source] #

Create a spreadsheet-style pivot table as a DataFrame.

The levels in the pivot table will be stored in MultiIndex objects (hierarchical indexes) on the index and columns of the result DataFrame.

Parameters values list-like or scalar, optional

Column or columns to aggregate.

index column, Grouper, array, or list of the previous

If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table index. If an array is passed, it is being used as the same manner as column values.

columns column, Grouper, array, or list of the previous

Читайте также:  Таблицы

If an array is passed, it must be the same length as the data. The list can contain any of the other types (except list). Keys to group by on the pivot table column. If an array is passed, it is being used as the same manner as column values.

aggfunc function, list of functions, dict, default numpy.mean

If list of functions passed, the resulting pivot table will have hierarchical columns whose top level are the function names (inferred from the function objects themselves) If dict is passed, the key is column to aggregate and value is function or list of functions. If margin=True , aggfunc will be used to calculate the partial aggregates.

fill_value scalar, default None

Value to replace missing values with (in the resulting pivot table, after aggregation).

margins bool, default False

If margins=True , special All columns and rows will be added with partial group aggregates across the categories on the rows and columns.

dropna bool, default True

Do not include columns whose entries are all NaN. If True, rows with a NaN value in any column will be omitted before computing margins.

margins_name str, default ‘All’

Name of the row / column that will contain the totals when margins is True.

observed bool, default False

This only applies if any of the groupers are Categoricals. If True: only show observed values for categorical groupers. If False: show all values for categorical groupers.

sort bool, default True

Specifies if the result should be sorted.

An Excel style pivot table.

Pivot without aggregation that can handle non-numeric data.

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

Wide panel to long format. Less flexible but more user-friendly than melt.

Reference the user guide for more examples.

>>> df = pd.DataFrame("A": ["foo", "foo", "foo", "foo", "foo", . "bar", "bar", "bar", "bar"], . "B": ["one", "one", "one", "two", "two", . "one", "one", "two", "two"], . "C": ["small", "large", "large", "small", . "small", "large", "small", "small", . "large"], . "D": [1, 2, 2, 3, 3, 4, 5, 6, 7], . "E": [2, 4, 5, 5, 6, 6, 8, 9, 9]>) >>> df A B C D E 0 foo one small 1 2 1 foo one large 2 4 2 foo one large 2 5 3 foo two small 3 5 4 foo two small 3 6 5 bar one large 4 6 6 bar one small 5 8 7 bar two small 6 9 8 bar two large 7 9 

This first example aggregates values by taking the sum.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'], . columns=['C'], aggfunc=np.sum) >>> table C large small A B bar one 4.0 5.0 two 7.0 6.0 foo one 4.0 1.0 two NaN 6.0 

We can also fill missing values using the fill_value parameter.

>>> table = pd.pivot_table(df, values='D', index=['A', 'B'], . columns=['C'], aggfunc=np.sum, fill_value=0) >>> table C large small A B bar one 4 5 two 7 6 foo one 4 1 two 0 6 

The next example aggregates by taking the mean across multiple columns.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], . aggfunc='D': np.mean, 'E': np.mean>) >>> table D E A C bar large 5.500000 7.500000 small 5.500000 8.500000 foo large 2.000000 4.500000 small 2.333333 4.333333 

We can also calculate multiple types of aggregations for any given value column.

>>> table = pd.pivot_table(df, values=['D', 'E'], index=['A', 'C'], . aggfunc='D': np.mean, . 'E': [min, max, np.mean]>) >>> table D E mean max mean min A C bar large 5.500000 9 7.500000 6 small 5.500000 9 8.500000 8 foo large 2.000000 5 4.500000 4 small 2.333333 6 4.333333 2 

Источник

Sorting in a Pandas pivot_table

I have been looking all over trying to figure out how to sort my pivot table correctly and I haven’t had any luck.

 client unit task hours month 0 A DVADA Account Management 6.50 January 1 A DVADA Buying 1.25 January 2 A DVADA Meeting / Call 0.50 January 3 A DVADA Account Management 3.00 January 4 A DVADA Billing 2.50 February 5 A DVADA Account Management 6.50 February 6 A DVADA Buying 1.25 February 7 A DVADA Meeting / Call 0.50 February 8 A DVADA Account Management 3.00 February 9 A DVADA Billing 2.50 February 10 A DVADA Billing 2.50 December 11 A DVADA Account Management 6.50 December 12 A DVADA Buying 1.25 December 13 A DVADA Meeting / Call 0.50 December 14 A DVADA Account Management 3.00 December 15 A DVADA Billing 2.50 December 16 A DVADA Account Management 6.50 August 17 A DVADA Buying 1.25 August 18 A DVADA Meeting / Call 0.50 August 19 A DVADA Account Management 3.00 August 20 A DVADA Account Management 6.50 April 21 A DVADA Buying 1.25 April 22 A DVADA Meeting / Call 0.50 April 23 A DVADA Account Management 3.00 April 24 B DVADA Account Management 6.50 January 25 B DVADA Buying 1.25 January 26 B DVADA Meeting / Call 0.50 January 27 B DVADA Account Management 3.00 January 28 B DVADA Billing 2.50 February 29 B DVADA Account Management 6.50 February 30 B DVADA Buying 1.25 February 31 B DVADA Meeting / Call 0.50 February 32 B DVADA Account Management 3.00 February 33 B DVADA Billing 2.50 February 34 B DVADA Billing 2.50 December 35 B DVADA Account Management 6.50 December 36 B DVADA Buying 1.25 December 37 B DVADA Meeting / Call 0.50 December 38 B DVADA Account Management 3.00 December 39 B DVADA Billing 2.50 December 40 B DVADA Account Management 6.50 August 41 B DVADA Buying 1.25 August 42 B DVADA Meeting / Call 0.50 August 43 B DVADA Account Management 3.00 August 44 B DVADA Account Management 6.50 April 45 B DVADA Buying 1.25 April 46 B DVADA Meeting / Call 0.50 April 47 C DVADA Account Management 3.00 April 48 C DVADA Account Management 6.50 January 49 C DVADA Buying 1.25 January 50 C DVADA Meeting / Call 0.50 January 51 C DVADA Account Management 3.00 January 52 C DVADA Billing 2.50 February 53 C DVADA Account Management 6.50 February 54 C DVADA Buying 1.25 February 55 C DVADA Meeting / Call 0.50 February 56 C DVADA Account Management 3.00 February 57 C DVADA Billing 2.50 February 58 C DVADA Billing 2.50 December 59 C DVADA Account Management 6.50 December 60 C DVADA Buying 1.25 December 61 C DVADA Meeting / Call 0.50 December 62 C DVADA Account Management 3.00 December 63 C DVADA Billing 2.50 December 64 C DVADA Account Management 6.50 August 65 C DVADA Buying 1.25 August 66 C DVADA Meeting / Call 0.50 August 67 C DVADA Account Management 3.00 August 68 C DVADA Account Management 6.50 April 69 C DVADA Buying 1.25 April 70 C DVADA Meeting / Call 0.50 April 71 C DVADA Account Management 3.00 April 

df = pd.pivot_table(vp_clients, values=’hours’, index=[‘client’, ‘month’], aggfunc=sum) Which returns a pivot table with three columns (client, month, hours). Each client has 12 months (Jan-Dec) and each of those months has a hours for that month.

 hours client month A April 203.50 August 227.75 December 159.75 February 203.25 January 199.25 B April 203.50 August 227.75 December 159.75 February 203.25 January 199.25 C April 203.50 August 227.75 December 159.75 February 203.25 January 199.25 
 hours client month A January 203.50 February 227.75 March 159.75 April 203.25 May 199.90 B January 203.50 February 227.75 March 159.75 April 203.25 May 199.90 C January 203.50 February 227.75 March 159.75 April 203.25 May 199.90 

The sorting issue is fixed with the answer below from Scott. Now I want to add a row to each client with the total hours used.

 hours client month A January 203.50 February 227.75 March 159.75 April 203.25 May 199.90 Total 1000.34 B January 203.50 February 227.75 March 159.75 April 203.25 May 199.90 Total 1000.34 C January 203.50 February 227.75 March 159.75 April 203.25 May 199.90 Total 1000.34 

Источник

How to sort pivot table in Pandas

enter image description here

The output looks like this: I’d like to sort the table by the id column, so that the largest number appear on top like:

 id month country us 4 5 cn 2 ca 1 

2 Answers 2

p1 = p.reset_index() .sort_values(['month','id'], ascending=[1,0]) .set_index(['month','country']) print (p1) id month country 5 us 4 cn 2 ca 1 6 ca 3 ru 3 7 cn 2 ru 1 

because this solution does not work 🙁

p1 = p.sort_index(level='month', sort_remaining=True) \ .sort_values('id', ascending=False) print (p1) id month country 5 us 4 6 ca 3 ru 3 5 cn 2 7 cn 2 5 ca 1 7 ru 1 

@ScottBoston — really hard question, I try find some issues about it in pandas github but uncessfully. 🙁

Option 1
This sorts by id within groups defined by the month level within the index

p.groupby( level='month', group_keys=False ).apply(pd.DataFrame.sort_values, by='id', ascending=False) id month country 5 us 4 cn 2 ca 1 6 ca 3 ru 3 7 cn 2 ru 1 

Option 2
This first sorts the entire dataframe by id then sorts again by the month level within the index. However, I had to use sort_remaining=False for self-explanatory reasons and kind=’mergesort’ because mergesort is a stable sort and won’t mess with the pre-existing order within groups defined by the ‘month’ level.

p.sort_values('id', ascending=False) \ .sort_index(level='month', sort_remaining=False, kind='mergesort') id month country 5 us 4 cn 2 ca 1 6 ca 3 ru 3 7 cn 2 ru 1 

Option 3
This uses numpy’s lexsort . this works, but I don’t like it because it depends on id being numeric and my being able to put a negative in front of it to get descending ordering. /shrug

p.iloc[np.lexsort([-p.id.values, p.index.get_level_values('month')])] id month country 5 us 4 cn 2 ca 1 6 ca 3 ru 3 7 cn 2 ru 1 

Источник

Оцените статью