Python add new column to dataframe

How To Add A New Column To An Existing Pandas DataFrame

Discussing 4 ways you can insert a new column to a pandas DataFrame

Introduction

In today’s short guide we will discuss four distinct ways you can add a new column into a DataFrame. Specifically, we’ll explore how to

  • insert one or multiple columns in one go
  • overwrite existing column(s)
  • add column(s) by taking into account the index
  • insert column(s) by ignoring the index
  • add column(s) with duplicate names
  • insert columns at specified location

using simple assignment, insert() , assign() and concat() methods.

First, let’s create an example DataFrame that we’ll reference throughout this guide to demonstrate a few concepts related to adding columns to pandas frames.

import pandas as pd
df = pd.DataFrame( 'colA':[True, False, False],
'colB': [1, 2, 3],
>)
print(df) colA colB
0 True 1
1 False 2
2 False 3

And finally, let’s assume that we need to insert a new column called colC that should contain values ‘a’ , ‘b’ and ‘c’ for indices 0 , 1 and 3 respectively.

s = pd.Series(['a', 'b', 'c'], index=[0, 1, 2])
print(s)
0 a
1 b
2 c
dtype: object

Using simple assignment

The easiest way to insert a new column is to simply assign the values of your Series into the existing frame:

df['colC'] = s.valuesprint(df) colA colB colC
0 True 1 a
1 False 2 b
2 False 3 c

Note that the above will work for most cases assuming that the indices of the new column match those of the DataFrame otherwise NaN values will be assigned to missing indices. For example,

df['colC'] = pd.Series(['a', 'b', 'c'], index=[1, 2, 3])
print(df)
colA colB colC
0 True 1 NaN
1 False 2 a
2 False 3 b

Источник

pandas.DataFrame.assign#

Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.

Parameters **kwargs dict of

The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.

A new DataFrame with the new columns in addition to all the existing columns.

Assigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

>>> df = pd.DataFrame('temp_c': [17.0, 25.0]>, . index=['Portland', 'Berkeley']) >>> df temp_c Portland 17.0 Berkeley 25.0 

Where the value is a callable, evaluated on df :

>>> df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0 

Alternatively, the same behavior can be achieved by directly referencing an existing Series or sequence:

>>> df.assign(temp_f=df['temp_c'] * 9 / 5 + 32) temp_c temp_f Portland 17.0 62.6 Berkeley 25.0 77.0 

You can create multiple columns within the same assign where one of the columns depends on another one defined within the same assign:

>>> df.assign(temp_f=lambda x: x['temp_c'] * 9 / 5 + 32, . temp_k=lambda x: (x['temp_f'] + 459.67) * 5 / 9) temp_c temp_f temp_k Portland 17.0 62.6 290.15 Berkeley 25.0 77.0 298.15 

Источник

Add Column to DataFrame Pandas (with Examples)

Add Column to DataFrame Pandas (with Examples)

There are many things we can do with the DataFrame we have built or imported in Pandas. It is possible to manipulate data in various ways, such as changing the data frame columns. Now, if we’re reading most of the data from one data source but some from another, we’ll need to know how to add columns to a Pandas DataFrame . Well, it’s pretty simple. As you have already noticed, there are a few different approaches to complete this work. Of course, this can be perplexing for newcomers. As a beginner, you may see numerous alternative methods for adding a column to a data frame and wonder which one to use. Don’t worry; in this article, we’ll go over four different ways to do the same. So, let’s get started!

What is Pandas in Python?

Pandas is a widely used open-source Python library for data science or data analysis and machine learning tasks. It has a lot of functions and methods for dealing with tabular data. Pandas’ main data structure is a data frame, which is a tabular data structure with labeled rows and columns. If you are a beginner in python then you can try these 20 pandas exercises.

Now, let us dive deep into learning Pandas DataFrames below:

What is a DataFrame?

A DataFrame represents a table of data with rows and columns and is the most common Structured API. Rows in a DataFrame indicate observations or data points. The properties or attributes of the observations are represented by columns. Consider a set of property pricing data. Each row represents a house, and each column represents a characteristic of the house, such as its age, number of rooms, price, etc.

Using Pandas, what can you do with DataFrames?

Many of the time-consuming, repetitive processes connected to working with data are made simple with Pandas. Following are a few of the tasks that you can efficiently perform with Pandas DataFrame:

  • Data Inspection
  • Data Cleansing
  • Data Normalization
  • Data Visualization
  • Statistical Analysis

First, let’s create an example DataFrame that we’ll use to explain a few ideas related to adding columns to pandas frames throughout this article.

For example:

import pandas as pd # importing pandas library df = pd.DataFrame(< 'colA':[True, False, False], 'colB': [1, 2, 3], >) # creating the DataFrame print(df)
colA colB 0 True 1 1 False 2 2 False 3 

Suppose we need to add a new column named ‘colC’ containing the values ‘a’, ‘b’, and ‘c’ for the indices 0, 1, and 2, respectively. How will we do it? Let’s see!

How to Add Column to Pandas DataFrame?

Below are the four methods by which Pandas add column to DataFrame. In our case, we’ll add ‘colC‘ to our sample DataFrame mentioned earlier in the article:

1) Using the simple assignment

You can add a new column to Dataframe by simply giving your Series’s data to the existing frame. It is one of the easiest and efficient methods widely used by python programmers. Note that the name of the new column should be enclosed with single quotes inside the square brackets, as shown in the below example.

For example:

colA colB colC 0 True 1 a 1 False 2 b 2 False 3 c

Note that in most circumstances, the above will work if the new column’s indices match those of the DataFrame; or else, NaN values will be given to missing indices.

For example:

df['colC'] = pd.Series(['a', 'b', 'c'], index=[1, 2, 3]) print(df)
colA colB colC 0 True 1 NaN 1 False 2 a 2 False 3 b

2) Using assign() method

Using the pandas.DataFrame.assign() method, you can insert multiple columns in a DataFrame, ignoring the index of a column to be added, or modify the values of existing columns. The method returns a new DataFrame object with all of the original columns as well as the additional(newly added) ones. Note that the index of the new columns will be ignored as well as, all the current columns will be overwritten if they are re-assigned.

For example:

e = pd.Series([1.0, 3.0, 2.0], index=[0, 2, 1]) s = pd.Series(['a', 'b', 'c'], index=[0, 1, 2]) df.assign(colC=s.values, colB=e.values)
colA colB colC 0 True 1.0 a 1 False 3.0 b 2 False 2.0 c

3) Using insert() method

Apart from the above two methods, you can also use the method pandas.DataFrame.insert() for adding columns to DataFrame . This method comes in handy when you need to add a column at a specific position or index. Remember that here we make use of the ‘len’ method to identify the length of the columns for existing DataFrames. The below example adds another column named ’colC’ at the end of the DataFrame.

For example:

df.insert(len(df.columns), 'colC', s.values) print(df)
colA colB colC 
0 True 1 a 1 False 2 b 2 False 3 c

Now, if you want to add a column ’colC’ in between two columns — ‘colA’ and ‘colB’.

For example:

df.insert(1, 'colC', s.values) print(df)
colA colC colB 0 True a 1 1 False b 2 2 False c 3 

Note that the insert() method cannot be used to add the column with a similar name. By default, a ValueError will be thrown when a column already exists in the DataFrame.

For example:

df.insert(1, 'colC', s.values) df.insert(1, 'colC', s.values)
ValueError: cannot insert colC, already exists

Nevertheless, the DataFrame will allow having two columns with the same name if you pass the command allow_duplicates=True to the insert() method.

For example:

df.insert(1, 'colC', s.values) df.insert(1, 'colC', s.values, allow_duplicates=True) print(df)
colA colC colC colB 0 True a a 1 1 False b b 2 2 False c c 3 

4) Using concat() method

The pandas.concat() method can also be used to add a column to the existing DataFrame by passing axis=1 . This method will return the new DataFrame as the output, including the newly added column. Using the index, the above method will concatenate the Series with the original DataFrame. Check out the below example for a better understanding.

For example:

df = pd.concat([df, s.rename('colC')], axis=1) print(df)
colA colB colC 0 True 1 a 1 False 2 b 2 False 3 c

Commonly you should use the above method if the indices of the objects to be added do match with each other. If the index doesn’t match, every object’s indices will be present in the resulting DataFrame, and the columns will represent NaN, as shown in the below example.

For example:

s = pd.Series(['a', 'b', 'c'], index=[10, 20, 30]) df = pd.concat([df, s.rename('colC')], axis=1) print(df)
colA colB colC 0 True 1.0 NaN 1 False 2.0 NaN 2 False 3.0 NaN 10 NaN NaN a 20 NaN NaN b 30 NaN NaN c

Conclusion

Adding columns to DataFrame is a commonly used data analysis and modification operation. However, Pandas provide numerous options for completing a task by giving four distinct methods, as shown in the above article. The index is one of the most challenging aspects of adding new columns to DataFrames. You should be cautious because each of the methods covered in this article may handle indices differently. However, if you have learned all the above methods perfectly, you are good to go for adding new columns to your DataFrames.

Источник

Читайте также:  Javascript последние 3 символа строки
Оцените статью