Rank в питоне это

How to Use the Pandas rank() Function?

The Dataframe.rank() function of Pandas is used to rank the data in different ways. After sorting (by default in ascending order), the position is used to determine the rank that is returned. If data contains equal values, then they are assigned with the average of the ranks of each value by default.

Syntax

Following is the syntax for the Dataframe.rank() function:

Parameters

Required Parameter:

  • axis:
    • It is 0 or index for rows and 1 or column for columns. It is zero by default.
    • It is an index to direct the ranking (if the axis is 0 it will direct toward rows and if the axis is 1, ranking will be done along columns).
    • This parameter is unused for Pandas Series, so defaults to 0.
    • It is the way to rank the group of records that have the same value.
    • average: It is the average rank of the group
    • min: min is for the lowest rank in the group
    • max: highest rank in the group
    • first: rankings are given according to the array’s order of appearance.
    • dense: similar to «min,» but rank is always increased by 1 across groups.
    • It is the way to rank NaN (Null) values.
    • It takes 3 string inputs as keep , top , and bottom , and can be the default keep .
    • Keep: give NaN ranks to NaN values
    • top: give NaN values the lowest rank possible
    • bottom: give NaN values the highest rank possible
    • It is of bool type and by default, True.
    • The elements should be sorted in ascending order or not.
    • It is of bool type and by default, False.
    • To display the returned rankings in percentile form or not.

    Optional Parameter:

    • numeric_only:
      • It is of the bool type.
      • Ranking only numeric columns for DataFrame objects when the value is True.

      Return Type

      It returns a Series or DataFrame with values according to the data ranks. Series that includes the caller series’ rank for each index. A DataFrame will be returned if a DataFrame is passed into the Pandas.rank() method, and a series (or a column) will be returned if a series is passed in. This is because the method is designed to return the same type as the object that is called it.

      Examples

      • Basic Ranking of Your Pandas Dataframe The pandas DataFrame.rank() function can be applied to the entire DataFrame with all the default arguments.

      Let’s see how to do this with our DataFrame, df in the below examples:

      Explanation: In the above code example, pandas is imported as pd, and a dictionary of data is created and stored in the data variable. Pandas DataFrame is created using a dictionary. We applied the pandas rank() function i.e. .rank() to the entire DataFrame. Equal values are ranked b using the average method i.e. ranks of the same values are averaged and then the average rank is assigned. The string values are ranked alphabetically in ascending order if there is any missing value, they are ignored in the ranking and ranked as NaN.

      Now, let’s see how we can rank only a single column in the example given below.

      Explanation: In the above example, a new column named Ranked_Author is created in which ranks of the Author column are assigned using the pandas rank function.

      The Dataframe.rank() function of pandas will rank the values of the data in ascending order by default, i.e. high values will be ranked high and lower will be ranked lower (starting from 1). But, if you want to change the order of the values in descending order, then you have to set the ascending parameter False as ascending=False .

      Let’s see how to rank the same columns of the data in a different order by using the pandas rank function with the help of the following examples:

      Explanation: In the above code example, we set the ascending parameter as False for reverse order.

      Explanation: Here, we created a new column Ranked_Author and stored the ranks of the Author column in reverse order by setting the ascending parameter as False .

      • Pandas Rank DataFrame with Different Methods Having data with identical values is not uncommon. Normally, this doesn’t cause any problems, but when you want to use Pandas to rank your data, you must specify how to sort equivalent values. The method= argument is used in this situation. The method= argument of pandas rank takes various options. Let’s see the following examples for a better understanding of pandas method= argument .

      Explanation: In the above example, we created a ranking of each method by assigning different arguments to the method parameter.

      • Pandas Rank Dataframe with a Groupby (Grouped Rankings) You can apply the .rank() function of the pandas to a group. For example, you can select the same values or the highest and lowest value on some particular day by using the .groupby() function. The pandas .groupby() function is used to split the data into groups based on different criteria. With the help of this function, you can group your data and can perform different operations on grouped data.

      Let’s see the how to use groupby() function with pandas .rank()

      Explanation: To rank our Stocks according to date, we created a new column Stocks Ranked by Date . Our data is initially grouped by Date, and then the Stocks column was selected. After that, we order the newly grouped column in descending order by setting the ascending parameter as False .

      Explanation: In the above example, the Stocks that have a ranking equal to 1 are stored in the Stocks Ranked by Date column.

      We can normalize our rankings so they have a value between 0 and 1 using the Pandas.rank() method, which is another great feature. Even while it may appear insignificant, this enables us to compare the minimum and maximum ranks across several columns, regardless of the number of unique values in each column.

      With the use of the pct parameter, we may use this normalized form of ranking. Let’s look at how we can use this in Python and Pandas:

      Explanation: Here, by setting the pct parameter True we can rank the pandas DataFrame with percentage.

      Источник

      pandas.DataFrame.rank#

      Compute numerical data ranks (1 through n) along axis.

      By default, equal values are assigned a rank that is the average of the ranks of those values.

      Parameters axis , default 0

      Index to direct ranking. For Series this parameter is unused and defaults to 0.

      How to rank the group of records that have the same value (i.e. ties):

      • average: average rank of the group
      • min: lowest rank in the group
      • max: highest rank in the group
      • first: ranks assigned in order they appear in the array
      • dense: like ‘min’, but rank always increases by 1 between groups.

      For DataFrame objects, rank only numeric columns if set to True.

      Changed in version 2.0.0: The default value of numeric_only is now False .

      • keep: assign NaN rank to NaN values
      • top: assign lowest rank to NaN values
      • bottom: assign highest rank to NaN values

      Whether or not the elements should be ranked in ascending order.

      pct bool, default False

      Whether or not to display the returned rankings in percentile form.

      Returns same type as caller

      Return a Series or DataFrame with data ranks as values.

      Rank of values within each group.

      Rank of values within each group.

      >>> df = pd.DataFrame(data='Animal': ['cat', 'penguin', 'dog', . 'spider', 'snake'], . 'Number_legs': [4, 2, 4, 8, np.nan]>) >>> df Animal Number_legs 0 cat 4.0 1 penguin 2.0 2 dog 4.0 3 spider 8.0 4 snake NaN 

      Ties are assigned the mean of the ranks (by default) for the group.

      >>> s = pd.Series(range(5), index=list("abcde")) >>> s["d"] = s["b"] >>> s.rank() a 1.0 b 2.5 c 4.0 d 2.5 e 5.0 dtype: float64 

      The following example shows how the method behaves with the above parameters:

      • default_rank: this is the default behaviour obtained without using any parameter.
      • max_rank: setting method = ‘max’ the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
      • NA_bottom: choosing na_option = ‘bottom’ , if there are records with NaN values they are placed at the bottom of the ranking.
      • pct_rank: when setting pct = True , the ranking is expressed as percentile rank.
      >>> df['default_rank'] = df['Number_legs'].rank() >>> df['max_rank'] = df['Number_legs'].rank(method='max') >>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom') >>> df['pct_rank'] = df['Number_legs'].rank(pct=True) >>> df Animal Number_legs default_rank max_rank NA_bottom pct_rank 0 cat 4.0 2.5 3.0 2.5 0.625 1 penguin 2.0 1.0 1.0 1.0 0.250 2 dog 4.0 2.5 3.0 2.5 0.625 3 spider 8.0 4.0 4.0 4.0 1.000 4 snake NaN NaN NaN 5.0 NaN 

      Источник

      pandas.Series.rank#

      Compute numerical data ranks (1 through n) along axis.

      By default, equal values are assigned a rank that is the average of the ranks of those values.

      Parameters axis , default 0

      Index to direct ranking. For Series this parameter is unused and defaults to 0.

      How to rank the group of records that have the same value (i.e. ties):

      • average: average rank of the group
      • min: lowest rank in the group
      • max: highest rank in the group
      • first: ranks assigned in order they appear in the array
      • dense: like ‘min’, but rank always increases by 1 between groups.

      For DataFrame objects, rank only numeric columns if set to True.

      Changed in version 2.0.0: The default value of numeric_only is now False .

      • keep: assign NaN rank to NaN values
      • top: assign lowest rank to NaN values
      • bottom: assign highest rank to NaN values

      Whether or not the elements should be ranked in ascending order.

      pct bool, default False

      Whether or not to display the returned rankings in percentile form.

      Returns same type as caller

      Return a Series or DataFrame with data ranks as values.

      Rank of values within each group.

      Rank of values within each group.

      >>> df = pd.DataFrame(data='Animal': ['cat', 'penguin', 'dog', . 'spider', 'snake'], . 'Number_legs': [4, 2, 4, 8, np.nan]>) >>> df Animal Number_legs 0 cat 4.0 1 penguin 2.0 2 dog 4.0 3 spider 8.0 4 snake NaN 

      Ties are assigned the mean of the ranks (by default) for the group.

      >>> s = pd.Series(range(5), index=list("abcde")) >>> s["d"] = s["b"] >>> s.rank() a 1.0 b 2.5 c 4.0 d 2.5 e 5.0 dtype: float64 

      The following example shows how the method behaves with the above parameters:

      • default_rank: this is the default behaviour obtained without using any parameter.
      • max_rank: setting method = ‘max’ the records that have the same values are ranked using the highest rank (e.g.: since ‘cat’ and ‘dog’ are both in the 2nd and 3rd position, rank 3 is assigned.)
      • NA_bottom: choosing na_option = ‘bottom’ , if there are records with NaN values they are placed at the bottom of the ranking.
      • pct_rank: when setting pct = True , the ranking is expressed as percentile rank.
      >>> df['default_rank'] = df['Number_legs'].rank() >>> df['max_rank'] = df['Number_legs'].rank(method='max') >>> df['NA_bottom'] = df['Number_legs'].rank(na_option='bottom') >>> df['pct_rank'] = df['Number_legs'].rank(pct=True) >>> df Animal Number_legs default_rank max_rank NA_bottom pct_rank 0 cat 4.0 2.5 3.0 2.5 0.625 1 penguin 2.0 1.0 1.0 1.0 0.250 2 dog 4.0 2.5 3.0 2.5 0.625 3 spider 8.0 4.0 4.0 4.0 1.000 4 snake NaN NaN NaN 5.0 NaN 

      Источник

      Читайте также:  Php передать переменную onclick
Оцените статью