How to rename files in bulk with PowerShell

As a part of one work for one of my Big Data Warehouse clients was to review and automate the ETL data pipeline to gain a better time2money efficiency in the current process the client’s data team was repeating for several months.

After running a discovery phase, I found that 36% of the time for ETL to run was spent on Raw data files preparation.

Mostly (30%) spent on renaming files. This process was just semimanual and with a high risk for human error. Wow…

Anyway, with the right level of empathy, you may guess that the data team focus was given to higher priority tasks and the time inefficiency of ETL pipelines was just tolerated compromise at that time.

No blame 🙂

Challenge description and solution

The Root Raw folder file structure was using a variant number of the nested folders in the folder hierarchy. Pretty confusing structure for someone new, to be honest.

So trying to find some sort of folder/file pattern did not sound like a viable option to me. Luckily, the raw data file names issued by 3rd party suppliers were consistent in one used naming convention and therefore using a regular expression and PowerShell was my best bet to start with first.

The objective in this exercise was to remove the PAF2_V postfix from the file name of this naming convention PAF2_V<version of data extract>_<data context type>.DAT.

In an example, this file name:

PAF2_V2022Q1V01_ABBREVIATIONS.DAT

must become:

ABBREVIATIONS.DAT

This is an example of the fragment of the file names and hierarchy:

Root |
     |- PAF2_V2022Q1V01_ABBREVIATIONS.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_STREET_NAMES.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_SUBURB_NAMES.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_TOWN_CITY_NAMES.DAT
     |- 2021_Q3 |
                |- PAF2_V2021Q3V01_ABBREVIATIONS.DAT
                |- PAF2_V2021Q3V01_ALTERNATIVE_STREET_NAMES.DAT
...

Surprisingly, the PowerShell command was very easy to build up!

Get-ChildItem -Filter "PAF2_*.DAT" -Recurse | Rename-Item -NewName {$_.Name -replace '^(?:[^_]+_){2}',''}

The breakdown structure of the PowerShell command:

  1. Get-ChildItem -Filter “PAF2_*.DAT” – the command gets and filters out all files not having PAF2_ prefix and .DAT extension from the list
  2. -Recurse – parameter used here for recursive type of search over the folder/file hierarchy
  3. Rename-Item -NewName {$_.Name -replace – the command renames all files by replacing found substring in the file name by some another
  4. ^(?:[^_]+_){2} – regular expression to find the substring in the string used for file name (to be replaced by empty string [”] eventially). As you can see, the substring search expression counts number of separator character ‘_’ in string. The match become after the second occurence.

Hope you enjoyed reading this post and found it helpful, too!

/cheers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s