Python dataframe row split

Split Pandas DataFrame by rows and columns

The data frame is a tabular form data structure. Instead of the need for a whole dataframe, we need to split it based on rows and columns. In this post, Split Pandas DataFrame by rows and columns by index, delimiters, and column names with multiple ways and examples.

1. split Pandas dataframe column by delimiter

This Dataframe contains Mark column values with delimiter hyphen(-). We are delimiting hyphen( – ) from each value of the Math column and splitting it into two-columns Math and Mark_ (delimited values column).

We can use any of the delimiters(, – / ) and many more as per requirement.

Program Example

import pandas as pd df_stu = pd.DataFrame([['John','Math','num-100'],['Jack','Sci','x-100'], ['Max','Phy','f-99'],['Rack','Music','num-80']], columns = ['Name','Subj','Mark']) #column split based on delimiter df_stu[[' Mark_','Mark']] = df_stu['Mark'].str.split('-',expand=True) print(df_stu)
Name Subj Mark Mark_ 0 John Math 100 num 1 Jack Sci 100 x 2 Max Phy 99 f 3 Rack Music 80 num
  • Add one or multiple columns to Pandas DataFrame
  • Fill nan values of multiple columns in Pandas
  • How to ffill missing value in Pandas
  • Add mutiple columns to Pandas DataFrame
  • Add numpy array to Pandas Dataframe as column
  • Append List as a row in Pandas Dataframe
  • How to Pandas sum all columns except one
  • Pandas sum rows by columns(6 ways)

2. split Pandas dataframe column by mutiple delimiter

Instead of a single delimiter, we need to delimit columns with multiple delimiters. We are splitting the dataframe column Math with multiple delimiters (‘/|;|_|%|-‘)

Program Example

import pandas as pd df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100], ['Max','Phy',99],['Rack','Music',80]], columns = ['Name','Subj','Mark']) res_df = df_stu.iloc[1. ] res_df1 = df_stu.iloc[:3,:] print(res_df,'\n') print(res_df1)
Name Subj Mark 1 Jack Sci 100 2 Max Phy 99 3 Rack Music 80 Name Subj Mark 0 John Math 100 1 Jack Sci 100 2 Max Phy 99

4. Pandas Split dataframe by list of indexes

In this example, we are using the Dataframe iloc() method to split the dataframe rows based on the List range of indexes.

Program Example

import pandas as pd df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100], ['Max','Phy',99],['Rack','Music',80]], columns = ['Name','Subj','Mark']) res_df = df_stu.iloc[[2 ,0 ] , : ] print(res_df,'\n')
Name Subj Mark 2 Max Phy 99 0 John Math 100

5.iloc() to split dataframe by columns index

In this example, we are using the DataFrame iloc() method to split the dataframe based on column indexes. We are selecting the column index range(0 to 2) and (0 to 3).

Читайте также:  Html bold underline italic

Program Example

import pandas as pd df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100], ['Max','Phy',99],['Rack','Music',80]], columns = ['Name','Subj','Mark']) #spliting dataframe by columns res_df = df_stu.iloc[. 2] #spliting dataframe by columns res_df1 = df_stu.iloc[. 3] print(res_df,'\n') print(res_df1)
Name Subj 0 John Math 1 Jack Sci 2 Max Phy 3 Rack Music Name Subj Mark 0 John Math 100 1 Jack Sci 100 2 Max Phy 99 3 Rack Music 80

6. Pandas split dataframe by column Name

In this example instead of index, we are splitting the dataframe based on column label.

Program Example

import pandas as pd df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100], ['Max','Phy',99],['Rack','Music',80]], columns = ['Name','Subj','Mark']) res_df = df_stu[['Subj','Mark']] print(res_df,'\n')
Subj Mark 0 Math 100 1 Sci 100 2 Phy 99 3 Music 80

7. Split dataframe Using Groupby

The Python pandas groupby() method is used to group the data in the dataframe based on category and is used to split the data based on some conditions on the group.

In this example, we are using groupby() to group data based on “Subj” and selecting the value of subject “Math”.

df_stu = pd.DataFrame(< 'Name': ["John","Max","Rack","Tax"], 'Subj': ["Math","Math","Sci","Math"], 'Marks':[100,100,99,99] >) result_df = df_stu.groupby('Subj') print(result_df.get_group('Math'))
Name Subj Marks 0 John Math 100 1 Max Math 100 3 Tax Math 99

8. Sample() method to split dataframe in Pandas

The sample() returns a random number of rows and columns from the dataframe and allows us the extract elements from a given axis.

In this example, frac=0.9 select the 90% rows from the dataframe and random_state allows us to get the same random data every time.

Program Example

import pandas as pd df_stu = pd.DataFrame([['John','Math',100],['Jack','Sci',100], ['Max','Phy',99],['Rack','Music',80]], columns = ['Name','Subj','Mark']) result_df = df_stu.sample(frac=0.9,random_state=60) print("split from df_student: \n") print(result_df)
split from df_student: Name Subj Mark 2 Max Phy 99 0 John Math 100 3 Rack Music 80 1 Jack Sci 100

Summary

In this post, we have learned how to split Split DataFrame by delimiter and index in Pandas by column name and group by and sample() method.

Источник

How to Split Pandas DataFrame?

We can split the Pandas DataFrame based on rows or columns by using Pandas.DataFrame.iloc[] attribute, groupby().get_group() , sample() functions. It returns some portion of DataFrame when we select the required portion of rows or columns from the DataFrame.

In this article, I will explain how to split a Pandas dataframe based on column or row using df.iloc[] and also I can split the DataFrame using df.groupby() and df.sample() function examples.

1. Quick Examples of Split Pandas DataFrame

Following are quick examples of how to split Pandas DataFrame.

 # Below are the quick examples # Example 1: Split the DataFrame using iloc[] by rows df1 = df.iloc[:2,:] df2 = df.iloc[2. ] # Example 2: Split the DataFrame using iloc[] by columns df1 = df.iloc[. 2] df2 = df.iloc[:,2:] # Example 3: Split Dataframe using groupby() & # Grouping by particular dataframe column grouped = df.groupby(df.Duration) df1 = grouped.get_group("35days") # Example 4: split DataFrame using sample() df1 = df.sample(frac = 0.5, random_state = 200) 

Let’s create Pandas DataFrame using data from a Python dictionary, where the columns are ‘Courses’ , ‘Fee’ , ‘Discount’ , and ‘Duration’ .

 import pandas as pd import numpy as np technologies= < 'Courses':["Spark", "PySpark", "Hadoop", "Python", "Pandas"], 'Fee' :[22000, 25000, 23000, 24000, 26000], 'Discount':[1000, 2300, 1000, 1200, 2500], 'Duration':['35days', '35days', '40days', '30days', '25days'] >df = pd.DataFrame(technologies) print(df) 
 # Output: Courses Fee Discount Duration 0 Spark 22000 1000 35days 1 PySpark 25000 2300 35days 2 Hadoop 23000 1000 40days 3 Python 24000 1200 30days 4 Pandas 26000 2500 25days 

2. Use iloc[] Split the DataFrame in Pandas

We can use the iloc [] attribute to split the given DataFrame. The iloc[] property is used to select rows and columns by position/index. Pandas loc[] is another property that is used to operate on the column and row labels.

Читайте также:  Application program interface java

2.1 Split DataFrame by Row

Using this property we can select the required portion based on rows from the DataFrame. Here, I will use the iloc[] property, to split the given DataFrame into two smaller DataFrames. Let’s split the DataFrame,

 # Split the DataFrame using iloc[] by rows df1 = df.iloc[:2,:] df2 = df.iloc[2. ] print(df1) print("---------------------------") print(df2) 
 # Output: Courses Fee Discount Duration 0 Spark 22000 1000 35days 1 PySpark 25000 2300 35days --------------------------- Courses Fee Discount Duration 2 Hadoop 23000 1000 40days 3 Python 24000 1200 30days 4 Pandas 26000 2500 25 

2.2 Split DataFrame by Columns

In the above section, you have learned how to split DataFrame using the iloc[] property based on rows. Now, we will learn how to split DataFrame using the iloc[] property based on columns. It is the same as above but the syntax is a little bit different. Let’s see what is that difference.

 # Split the DataFrame using iloc[] by columns df1 = df.iloc[. 2] df2 = df.iloc[:,2:] print(df1) print("---------------------------") print(df2) 
 # Output: Courses Fee 0 Spark 22000 1 PySpark 25000 2 Hadoop 23000 3 Python 24000 4 Pandas 26000 --------------------------- Discount Duration 0 1000 35days 1 2300 35days 2 1000 40days 3 1200 30days 

3. Split Pandas Dataframe using groupby() function

The Pandas.groupby() function is used to split the DataFrame based on some values. First, we can group the DataFrame using the groupby() function after that we can select specified groups using the get_group() function. This is the best function when we want to split a DataFrame based on some column that has unique values.

 # Split Dataframe using groupby() & # Grouping by particular dataframe column grouped = df.groupby(df.Duration) df1 = grouped.get_group("35days") print(df1) 
 # Output: Courses Fee Discount Duration 0 Spark 22000 1000 35days 1 PySpark 25000 2300 35days 

The above example returns a new DataFrame consisting of grouped data with ‘Duration’ is ’35days’ .

4. Split the DataFrame using Pandas Shuffle Rows

By using pandas.DataFrame.sample() function we can split the DataFrame by changing the order of rows. pandas.sample(frac=1) function is used to shuffle the order of rows randomly. The frac keyword argument specifies the fraction of rows to return in the random sample DataFrame. frac=None just returns 1 random record. frac=.5 returns random 50% of the rows.

Читайте также:  Перегрузка оператора си шарп

Let’s see sample() function how to split our DataFrame with random rows.

 # Split DataFrame using sample() df1 = df.sample(frac = 0.5, random_state = 200) print(df1) print(df1.reset_index()) 
 # Output: index Courses Fee Discount Duration 0 3 Python 24000 1200 30days 1 4 Pandas 26000 2500 25days 

5. Conclusion

In this article, I have explained how to split Pandas DataFrame using df.iloc[] attribute based on both rows and columns and also using df.groupby() function and df.sample() function how we can split the DataFrame with well defined examples.

References

You may also like reading:

Источник

Python: Split a Pandas Dataframe

Split a Pandas Dataframe Cover Image

In this post, you’ll learn how to split a Pandas dataframe in different ways. You’ll learn how to split a Pandas dataframe by column value, how to split a Pandas dataframe by position, and how to split a Pandas dataframe by random values.

Knowing how to split a Pandas dataframe is a useful skill in many applications: machine learning to select random values, splitting data into specific records for report automation, etc.

Loading a Sample Dataframe

If you want to follow along with the tutorial, feel free to load the sample Pandas dataframe provided below. If you have your own data, feel free to use that as well, though some specific examples may need additional tailoring for your context.

Let’s get started and load some data!

import pandas as pd df = pd.DataFrame.from_dict( < 'Name': ['Jenny', 'Matt', 'Kristen', 'Jenny', 'Matt', 'Kristen', 'Jenny', 'Matt', 'Kristen', 'Jenny', 'Matt', 'Kristen'], 'Year': [2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022, 2020, 2021, 2022], 'Income': [10000, 11000, 9000, 12000, 13000, 11000, 14000, 15000, 13000, 12000, 14000, 13000], 'Gender': ['F', 'M', 'F', 'F', 'M', 'F', 'F', 'M', 'F', 'F', 'M', 'F'] >) print(df)

Printing out the dataframe returns the following:

 Name Year Income Gender 0 Jenny 2020 10000 F 1 Matt 2021 11000 M 2 Kristen 2022 9000 F 3 Jenny 2020 12000 F 4 Matt 2021 13000 M 5 Kristen 2022 11000 F 6 Jenny 2020 14000 F 7 Matt 2021 15000 M 8 Kristen 2022 13000 F 9 Jenny 2020 12000 F 10 Matt 2021 14000 M

We can see that we have four columns: Name, Year, Income, and Gender. We have the incomes for three people for three years, and their genders. Let’s see how we can split this dataframe!

Split a Pandas Dataframe by Column Value

Splitting a dataframe by column value is a very helpful skill to know. It can help with automating reporting or being able to parse out different values of a dataframe.

The way that you’ll learn to split a dataframe by its column values is by using the .groupby() method. I have covered this method quite a bit in this video tutorial:

Источник

Оцените статью