How to rename files in bulk with PowerShell

As a part of one work for one of my Big Data Warehouse, clients were to review and automate the ETL data pipeline to gain a better time2money efficiency in the current process the client’s data team was repeating for several months.

After running a discovery phase, I found that 36% of the time for ETL to run was spent on the Raw data file preparation stage.

Most (30%) spent on renaming files. This process was just semimanual and with a high risk for human error. Wow…

Anyway, with the right level of empathy, you may guess that the data team focus was given to higher priority tasks, and the time inefficiency of ETL pipelines was just tolerated compromise at that time.

No blame 🙂

Challenge description and solution

The Root Raw folder file structure was using a variant number of the nested folders in the folder hierarchy. Pretty confusing structure for someone new, to be honest.

So trying to find some sort of folder/file pattern did not sound like a viable option to me. Luckily, the raw data file names issued by 3rd party suppliers were consistent in one used naming convention, and therefore using a regular expression and PowerShell was my best bet to start with first.

The objective of this exercise was to remove the PAF2_V postfix from the file name of this naming convention PAF2_V<version of data extract>_<data context type>.DAT.

For an example, this file name:

PAF2_V2022Q1V01_ABBREVIATIONS.DAT

must become:

ABBREVIATIONS.DAT

This is an example of the fragment of the file names and hierarchy:

Root |
     |- PAF2_V2022Q1V01_ABBREVIATIONS.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_STREET_NAMES.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_SUBURB_NAMES.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_TOWN_CITY_NAMES.DAT
     |- 2021_Q3 |
                |- PAF2_V2021Q3V01_ABBREVIATIONS.DAT
                |- PAF2_V2021Q3V01_ALTERNATIVE_STREET_NAMES.DAT
...

Surprisingly, the PowerShell command was very easy to build up!

Get-ChildItem -Filter "PAF2_*.DAT" -Recurse | Rename-Item -NewName {$_.Name -replace '^(?:[^_]+_){2}',''}

The breakdown structure of the PowerShell command:

  1. Get-ChildItem -Filter “PAF2_*.DAT” – the command gets and filters out all files not having PAF2_ prefix and .DAT extension from the list
  2. -Recurse – parameter used here for the recursive type of search over the folder/file hierarchy
  3. Rename-Item -NewName {$_.Name -replace – the command renames all files by replacing found substring in the file name with some another
  4. ^(?:[^_]+_){2} – regular expression to find the substring in the string used for file name (to be replaced by empty string [”] eventually). As you can see, the substring search expression counts the number of separator characters ‘_’ in the string. The match becomes after the second occurrence.

Thanks for staying, subscribe to my blog, and leave me a comment below.

cheers\

Leave a comment