Python list group by count

Groupby in a list for python

Given a large dataset of one million records, I am looking for ways to do a group by. I am new to python, but i know in SQL there’s a groupby function and i am guessing it might be applicable. What i want to achieve is this, From

I am also looking for an efficient way to calculate the average of the list of ratings. So finally the output should be:

I’ve tried to do a iterative approach to it but the error thrown was «there was too much data to unpack». Here is my solution which is not workng for the dataset.

len = max(key for (item, key) in results) newList = [[] for i in range(len+1)] for item, key in results: newListPython list group by count.append(item) 

I am looking for efficient way to do it, is there a way to do a groupby in list comprehension? Thanks!

The error means that your dataset is not of the form [(x, y), . ] . Are you sure that results is an iterable of (x, y) pairs?

3 Answers 3

I think the following would be a reasonable approach for a small dataset.

from collections import defaultdict ls = [ ["A", 4], ["B", 4], ["F", 3], ["A", 4], ["B", 1], ] d = defaultdict(list) for key, val in ls: dPython list group by count.append(val) # Prints [['A', (4, 4)], ['B', (4, 1)], ['F', (3,)]] print [[k, tuple(l)] for k, l in d.items()] # prints [['A', 4.0], ['B', 2.5], ['F', 3.0]] print [[k, float(sum(l))/len(l)] for k, l in d.items()] #* 

*In Python 2.x, use iteritems() instead of items() , see this answer.

Slightly better, if you only care about the averages, you won’t need to store everything that a key maps to:

d = defaultdict(lambda: (0, 0)) for key, val in ls: cnt, total = dPython list group by count dPython list group by count = (cnt + 1, total + val) print [[k, float(total) / cnt] for k, (cnt, total) in d.items()] 

There is indeed a groupby method in itertools , just be aware that it requires the data to be sorted beforehand, see the documentation here https://docs.python.org/2/library/itertools.html#itertools.groupby

But from the code you posted, it looks like you don’t really need to group, you just want to count, right? Then you may better use collections.Counter . Note that it requires the items to be hashable so you’d want to convert those lists into tuples.

>>> lst = [tuple(i) for i in ls] >>> collections.Counter(lst) Counter(<('A', 4): 2, ('F', 3): 1, ('B', 1): 1, ('B', 4): 1>) 

Regarding efficiency. Not sure you’d fare very well loading the whole dataset in memory but you could use the defaultdict approach described by Vlad with an iterator.

About the averages, if you really want to use groupby then you could do something like this:

>>> def average(lst): . return 1.0*sum(lst)/len(lst) if lst else 0.0 >>> [(i[0],average([j[1] for j in i[1]])) for i in itertools.groupby(sorted(ls),key=lambda i:i[0])] [('A', 4.0), ('B', 2.5), ('F', 3.0)] 

Источник

Python List Group By Count With Code Examples

In this lesson, we’ll use programming to aim to unravel the Python List Group By Count puzzle. This is demonstrated by the code beneath.

>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5] >>> d = >>> d >>> a, b = d.keys(), d.values() >>> a [1, 2, 3, 4, 5] >>> b [4, 4, 2, 1, 2]

Another resolution that’s described beneath with code examples can be utilized to unravel the identical difficulty Python List Group By Count.

import collections a = [1,1,1,1,2,2,2,2,3,3,4,5,5] counter=collections.Counter(a) print(counter) # Counter() print(counter.values()) # [4, 4, 2, 1, 2] print(counter.keys()) # [1, 2, 3, 4, 5] print(counter.most_common(3)) # [(1, 4), (2, 4), (3, 2)]
df[['col1', 'col2', 'col3', 'col4']].groupby(['col1', 'col2']).agg(['mean', 'count'])
from collections import Counter def frequency_table(n): desk = Counter(n) print('NumbertFrequency') for quantity in desk.most_common() : print('t'.format(quantity[0], quantity[1])) # src : Doing Math With Python

We have been in a position to repair the Python List Group By Count drawback by quite a lot of completely different examples.

How do you depend occurrences in a listing in Python?

The best technique to depend the variety of occurrences in a Python listing of a given merchandise is to make use of the Python . depend() technique. The technique is utilized to a given listing and takes a single argument. The argument handed into the tactic is counted and the variety of occurrences of that merchandise within the listing is returned.12-Nov-2021

How do you depend the variety of repeated components in a listing in Python?

Operator. countOf() is used for counting the variety of occurrences of b in a. It counts the variety of occurrences of worth. It returns the Count of quite a lot of occurrences of worth.11-Sept-2022

How do you depend a number of gadgets in a listing Python?

If you need to depend a number of gadgets in a listing, you may name depend() in a loop. This method, nonetheless, requires a separate go over the listing for each depend() name; which could be catastrophic for efficiency. Use couter() technique from class collections , as a substitute.

How do you depend occurrences in a listing?

Using the depend() Function The “commonplace” manner (no exterior libraries) to get the depend of phrase occurrences in a listing is by utilizing the listing object’s depend() operate. The depend() technique is a built-in operate that takes a component as its solely argument and returns the variety of occasions that ingredient seems within the listing.28-Dec-2021

How do you depend particular values in Python?

Use Sum Function to Count Specific Values in a Column in a Dataframe. We can use the sum() operate on a specified column to depend values equal to a set situation, on this case we use == to get simply rows equal to our particular knowledge level.

How many occasions a price seems in a listing Python?

The depend() technique in Python calculates what number of occasions a selected worth seems inside a string or a listing in Python. depend() accepts one argument: the worth for which you need to search within the string or listing. When depend() is used with a string, it’ll seek for a substring inside a bigger string.02-Dec-2020

How do you utilize the depend operate in Python?

The depend() technique returns the variety of occurrences of a substring within the given string.depend() Parameters

  • substring – string whose depend is to be discovered.
  • begin (Optional) – beginning index throughout the string the place search begins.
  • finish (Optional) – ending index throughout the string the place search ends.

How do you depend components in an array Python?

Python len() technique allows us to seek out the whole variety of components within the array/object. That is, it returns the depend of the weather within the array/object.

How do you depend occurrences in a string in Python?

depend() One of the built-in methods wherein you should use Python to depend the variety of occurrences in a string is utilizing the built-in string . depend() technique. The technique takes one argument, both a personality or a substring, and returns the variety of occasions that character exists within the string related to the tactic.29-Aug-2021

How do I depend the variety of occurrences in a column in Python?

Using the dimensions() or depend() technique with pandas. DataFrame. groupby() will generate the depend of quite a lot of occurrences of information current in a selected column of the dataframe.02-Jun-2021

Build with us Share this content

Источник

Python group list count

Here is what I tried to do and I got Solution 1: Assuming your , etc are your , : Output: Counter gives you to get the lists you want: Output: If you really need 0-counts, you can add them after the fact: Bit cumbersome, but that way anyting will be present for all of them, with a 0 value if not present in Output: Solution 2: You can try something like this : output: I would like the output to display each group «name» and the total number of values that fall within that range.

Python — Group / count a list of numbers by custom ranges

I have a list of numbers which I would like to group into «bins» or «buckets» based on ranges that I define. I would like the output to display each group «name» and the total number of values that fall within that range.

my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004]

  • greater than 250,000 (output: 1)
  • greater than 100,000 but less than or equal to 250,000 (output: 1)
  • greater than 10,000 but less than or equal to 100,000 (output: 0)
  • greater than 1,000 but less than or equal to 10,000 (output: 1)
  • greater than 100 but less than or equal to 1000 (output: 2)
  • less than 100 (output: 3)

I could obviously achieve this in a very crude way by writing multiple conditional if statements but I am aware that there must be a more elegant way of achieving the result.

Various searches have indicate that I could possibly achieve this using pandas.cut / digitize however as of yet, I have been unsuccessful in achieving the required output.

Any assistance would be much appreciated.

you are right, you can use pd.cut combined with a groupby to achieve what you want.

Step 1: Define data

import pandas as pd import numpy as np my_list = [1, 123123, 12, 982023, 24, 446, 903, 2004] df = pd.DataFrame(my_list, columns=['data']) cut_edges = np.array([-np.inf, 100, 1000, 10000, 100000, 250000, np.inf]) labels = ['less than 100', 'between 100 and 1,000', 'between 1,000 and 10,000', 'between 10,000 and 100,000', 'between 100,000 and 250,000', 'greater than 250,000'] 

Step 2: Generate the category name using pd.cut, and set index for groupby later

df['category'] = pd.cut(df['data'], cut_edges, labels=labels) df.set_index('category', append=False, inplace=True) 

Step 3: groupby to do the count

df.groupby(level='category').count() 

As pointed out in the comments numpy.histogram is another possibly more concise approach which will work. This answer used pd.cut as it was specifically mentioned in the question.

Python group by and count distinct values in a column and create, You can drop_duplicates first and then agg : print (df.drop_duplicates().groupby(«company», as_index=False)[«product»].agg([list, «count»]))

Python | Group and count similar records

Sometimes, while working with records, we can have a problem in which we need to collect and maintain the counter value inside records. This kind of application is important in web development domain. Let’s discuss certain ways in which this task can be performed.

Method #1 : Using loop + Counter() + set()
The combination of above functionalities can be employed to achieve this task. In this, we run a loop to capture each tuple and add to set and check if it’s already existing, then increase and add a counter value to it. The cumulative count is achieved by using Counter() .

Источник

Python Group by count

Given a dictionary, I need some way to do the following: In the dictionary, we have names, gender, occupation, and salary. I need to figure out if each name I search in the dictionay, there are no more than 5 other employees that have the same name, gender and occupation. If so, I output it. Otherwise, I remove it. Any help or resources would be appreciated! What I researched: count = Counter(tok[‘Name’] for tok in input_file) This counts the number of occurances for name (ie Bob: 2, Amy: 4). However, I need to add the gender and occupation to this as well (ie Bob, M, Salesperson: 2, Amy, F, Manager: 1).

what are the < key : value >pairs. How have u stored the data in the dict ? If you are still to attempt this problem, why don’t you try out an Object-Oriented Design ?

1 Answer 1

To only check if the dictionary has 5 or more (key,value) pairs, in which the name,gender and occupation of employee is same, is quite simple. To remove all such inconsistencies is tricky.

# data = <> # key = 'UID' # value = ('Name','Male','Accountant','20000') # dataPython list group by count = value def consistency(dictionary): temp_list_of_values_we_care_about = [(x[0],x[1],x[2]) for x in dictionary.itervalues()] temp_dict = <> for val in temp_list_of_values_we_care_about: if val in temp_dict: temp_dict[val] += 1 else: temp_dict[val] = 1 if max(temp_dict.values()) >=5: return False else: return True 

And to actually, get a dictionary with those particular values removed, there are two ways.

  1. Edit and update the original dictionary. (Doing it in-place)
  2. Create a new dictionary and add only those values which satisfy our constraint.
def consistency(dictionary): temp_list_of_values_we_care_about = [(x[0],x[1],x[2]) for x in dictionary.itervalues()] temp_dict = <> for val in temp_list_of_values_we_care_about: if val in temp_dict: temp_dict[val] += 1 else: temp_dict[val] = 1 new_dictionary = <> for key in dictionary: value = dictionaryPython list group by count temp = (value[0],value[1],value[2]) if temp_dict[temp]  

P.S. I have chosen the much easier second way to do it. Choosing the first method will cause a lot of computation overhead, and we certainly would want to avoid that.

Источник

Читайте также:  Javascript content width height
Оцените статью