How can you filter large CSV files in Python?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How can you filter large CSV files in Python?

pythonsevenmentor
1. Using pandas with chunksize (Best for Large Files)
The chunksize parameter processes the CSV file in smaller portions, avoiding memory overload.
import pandas as pd

# Define filter condition function
def filter_chunk(chunk):
    return chunk[chunk["column_name"] > 50]  # Example: Filter rows where column_name > 50

# Process in chunks and write to a new CSV
chunksize = 10000  # Adjust based on available memory
filtered_data = pd.concat(filter_chunk(chunk) for chunk in pd.read_csv("large_file.csv", chunksize=chunksize))

# Save filtered data
filtered_data.to_csv("filtered_file.csv", index=False)

Link: Python Training in Pune