Merge excel files python

3 Ways of to Merge Excel Files by openpyxl/Pandas

In this tutorial, we will show you how to merge two or more Excel files/sheets by using padas (primarily) and then also using openpyxl(somewhat).

Merging files of Excel by using Pandas and/or openpyxl

In this tutorial, we will show you how to merge two or more Excel files/sheets by using padas (primarily) and then also using openpyxl(somewhat).

For information, if you don’t know already, Pandas is a Python library for analyzing data, created by Wes McKinney.

Pandas has methods that you can use to work with Excel – including merging files – so let us look at it below section.

An example of merging two files by Pandas

In this example, we will use the concat() method of Pandas. For that, we will specify two files that we want to merge and a third file is created with the data of both. First, have a look at two files that we will merge:

File 1: test_1.xlsx

Excel-Pandas-merge

File 2: test_2.xlsx

Excel-Pandas-merge-2

The code to merge two files:

Output after code execution (File_3.xlsx):

Excel-Pandas-merged

Notice file_2.iloc[0:,0:]] in above code:

  • First 0 represents from which row to merge data from the second sheet.
  • The second0 represents column of the second file.

See another example below to understand its usage more clearly.

Specifying row and column for the merging file

We will merge only the second column and from the fifth row to the end of the test_2.xlsx file to the newly created test_3.xlsx merged file.

All data from test_1.xlsx will be copied:

Excel-partial-merged

You can see the difference between this merge and the above example.

Second way – Using Pandas/openpyxl for merging sheets

In this way, the Pandas ExcelWriter method is used to merge two or more sheets.

There, we specify openpyxl as the engine.

To understand, this is pandas.ExcelWriter syntax:

class pandas.ExcelWriter(path, engine=None, date_format=None, datetime_format=None, mode=’w’, storage_options=None, if_sheet_exists=None, engine_kwargs=None, **kwargs)

For our example, we have these two Excel Files before merging (Same as in the above example):

File 1: test_1.xlsx

Excel-Pandas-merge

File 2: test_2.xlsx

Excel-Pandas-merge-2

The example below shows using this method for merging two files of Excel:

with panda_merge . ExcelWriter ( «test_1.xlsx» , mode = «a» , engine = «openpyxl» , if_sheet_exists = «overlay» ) as writer :

data_frame . to_excel ( writer , sheet_name = «My_sheet1» , columns = None , startcol = writer . sheets [ «My_sheet1» ] . max_column , index = False )

The output after we executed this code:

Excel-merged-openpyxl

Note that this is test_1.xlsx file, unlike the above two examples where we created a new third file.

  • data_frame reading the test_2.xlsx file – file to be merged.
  • This line: data_frame=data_frame.iloc[:,0:] specifies which rows/columns to merge. We specified 0, so both columns will be merged. If we specify 1, only the second column will be merged.
  • By using ExcelWriter() method, the test_1 file is opened in the append mode.
  • Also notice, we provided the sheet name “My_sheet1”.
Читайте также:  Exception java какие бывают

Learn more about ExcelWriter() method here.

Merging only one column/fewer rows by openpyxl and pandas example

Just to show you how to specify rows/cols to merge, see the code below. Only difference is single line i.e. iloc.

See the code and output below:

with panda_merge . ExcelWriter ( «test_1.xlsx» , mode = «a» , engine = «openpyxl» , if_sheet_exists = «overlay» ) as writer :

data_frame . to_excel ( writer , sheet_name = «My_sheet1» , columns = None , startcol = writer . sheets [ «My_sheet1» ] . max_column , index = False )

Pandas-openpyxl-partia

You can see in the above image, only the second column (Status), with rows after 4 are merged.

Notice this line in the above code:

Using pandas.DataFrame.merge method to combine two files

If you have some knowledge of working with databases (MySQL, SQL Server etc.) then this merge type is like a join clause (used in databases).

In that merge method, the join is done at columns or indexes. For example, there should be one common column in both sheets that you want to merge.

Take the example of our sample sheets, if we try to join two files as shown in the above examples:

Then pandas.DataFrame.merge will produce an error as there is no common column in both files.

However, for the example, we are adding another column in the test_2.xlsx, i.e. “Product ID”, so it exists in both files to be merged.

After that, these are two files that we will merge by using pandas.DataFrame.merge method:

Excel-Pandas-merge

Pandas-merge-excel-2

So, you can see that the test_2.xlsx is different than used in the above examples i.e. it contains an additional common column “Product ID”.

Now let us perform a merge of these two files in the third one:

Источник

How to Consolidate Multiple Excel Workbooks With Python

No matter how many Excel spreadsheets you want to combine, Python lets you do it with relative ease. Here, you can learn all about it.

Python wrapped around a tree branch

Readers like you help support MUO. When you make a purchase using links on our site, we may earn an affiliate commission. Read More.

Python, as a language, is valuable beyond measure, especially when you want to work with structured data. Since people store a lot of data in Excel files, it is imperative to consolidate multiple files to save time and effort.

Python lets you do exactly that; no matter how many Excel files you want to combine, you can do it with relative ease. Given its range of libraries and third-party resources, you can import and use Python’s multi-faceted tools to do your bidding.

In this guide, you will need to install and use the Pandas libraries to import data into Python before consolidating it.

Install Pandas Libraries in Python

Pandas is a third-party library that you can install in Python. Some IDEs already have Pandas installed in them.

Читайте также:  Writing games with java

If you are using an IDE version that doesn’t come with pre-installed Pandas, rest assured, you can install it directly in Python.

Here’s how to install Pandas:

If you’re using Jupyter Notebook, you can install Pandas directly with the PIP command. Mostly, when you have installed Jupyter with Anaconda, there are high chances of already having Pandas available for direct use.

If you cannot call Pandas, you can use the above command to install them directly.

Combining Excel Files With Python

First, you need to create a folder in your preferred location with all the Excel files. Once the folder is ready, you can start writing the code to import the libraries.

You will use two variables in this code:

  1. Pandas: The Pandas library provides the data frames to store Excel files.
  2. OS: The library is beneficial for reading data from your machine’s folder

To import these libraries, use these commands:

Import Pandas as pd

Import OS
  • Import: Python syntax used to import the libraries in Python
  • Pandas: Name of the library
  • pd: Alias given to the library
  • OS: A library to access the system folder

Once you have imported the libraries, create two variables to store the input and output file path. The input file path is needed to access the files’ folder. The output file path is necessary as the combined file will be exported there.

If you are using Python, make sure you change the backslash to forward-slash (\ to /)

input_file_path = "C:/Users/gaurav/OneDrive/Desktop/Excel files/"

output_file_path = "C:/Users/gaurav/OneDrive/Desktop/"

Append the / at the end as well to complete the paths.

The folder’s files are available in a list. Create a list to store all the file references of the input folder using the listdir function from the OS library.

If you are unsure of the functions available within a library, you can use the dir function with the library name. For example, to check the exact version of the listdir function, you can use the command as follows:

The output will consist of all the associated functions available within the OS library. The listdir function is one of the many functions available within this library.

Create a new variable to store the input files from the folder.

excel_file_list = os.listdir(input_file_path) 

Print this variable to see the names of the files stored within the folder. All files stored within the folder are displayed once you use the print function.

Next, you need to add a new data frame to store each Excel file. Imagine a data frame as a container for storing data. Here’s the command for creating a data frame.

  • df: Variable to store the value of the DataFrame
  • pd: Alias for the Pandas library
  • DataFrame: Default syntax for adding a data frame

The input folder has three .xlsx files in this example. The file names are:

File1_excel.xlsx

File2_excel.xlsx

File3_excel.xlsx

Files folder

To open each file from this folder, you need to run a loop. The loop will run for each of the files in the list created above.

for excel_files in excel_file_list: 

Next, it’s necessary to check the extensions of the files since the code will open XLSX files only. To check these files, you can use an If statement.

Читайте также:  Python regex all except

Use the endswith function for this purpose, as follows:

for excel_files in excel_file_list:



if excel_files.endswith(".xlsx"):
  • excel_files: List with all the file values
  • endswith: Function to check the extension of the files
  • («.xlsx»): This string value can change, depending on what you want to search for

Now that you’ve identified the Excel files, you can create a new data frame to read and store the files individually.

for excel_files in excel_file_list:



if excel_files.endswith(".xlsx"):



df1 = pd.read_excel(input_file_path+excel_files)
  • df1: New data frame
  • pd: Pandas library
  • read_excel: Function to read Excel files within the Pandas library
  • input_file_path: Path of the folder where the files are stored
  • excel_files: Any variable which is used in the for loop

To start appending the files, you need to use the append function.

for excel_files in excel_file_list:



if excel_files.endswith(".xlsx"):



df1 = pd.read_excel(input_file_path+excel_files)

df = df.append(df1)

Finally, now that the consolidated data frame is ready, you can export it to the output location. In this case, you are exporting the data frame to an XLSX file.

df.to_excel(output_file_path+"Consolidated_file.xlsx") 
  • df: Dataframe to export
  • to_excel: Command used to export the data
  • output_file_path: Path defined for storing the output
  • Consolidated_file.xlsx: Name of the consolidated file

Now, let’s look at the final code:

#Pandas is used as a dataframe to handle Excel files

import pandas as pd

import os



# change the slash from “\” to “/”, if you are using Windows devices



input_file_path = "C:/Users/gaurav/OneDrive/Desktop/Excel files/"

output_file_path = "C:/Users/gaurav/OneDrive/Desktop/"



#create a list to store all the file references of the input folder using the listdir function from the os library.

#To see the contents of a library (like the listdir function, you can use the dir function on the library name).

#Use dir(library_name) to list contents



excel_file_list = os.listdir(input_file_path)



#print all the files stored in the folder, after defining the list

excel_file_list





#Once each file opens, use the append function to start consolidating the data stored in multiple files



#create a new, blank dataframe, to handle the excel file imports

df = pd.DataFrame()



#Run a for loop to loop through each file in the list

for excel_files in excel_file_list:

#check for .xlsx suffix files only

if excel_files.endswith(".xlsx"):

#create a new dataframe to read/open each Excel file from the list of files created above

df1 = pd.read_excel(input_file_path+excel_files)

#append each file into the original empty dataframe

df = df.append(df1)



#transfer final output to an Excel (xlsx) file on the output path

df.to_excel(output_file_path+"Consolidated_file.xlsx")

Jupyter Notebook code snippet

Using Python to Combine Multiple Excel Workbooks

Python’s Pandas is an excellent tool for beginners and advanced users alike. The library is used extensively by developers who want to master Python.

Even if you are a beginner, you can benefit immensely by learning the nuances of Pandas and how the library is used within Python.

Источник

Оцените статью