How to rename files in bulk with PowerShell

As a part of one work for one of my Big Data Warehouse clients was to review and automate the ETL data pipeline to gain a better time2money efficiency in the current process the client’s data team was repeating for several months.

After running a discovery phase, I found that 36% of the time for ETL to run was spent on Raw data files preparation.

Mostly (30%) spent on renaming files. This process was just semimanual and with a high risk for human error. Wow…

Anyway, with the right level of empathy, you may guess that the data team focus was given to higher priority tasks and the time inefficiency of ETL pipelines was just tolerated compromise at that time.

No blame 🙂

Challenge description and solution

The Root Raw folder file structure was using a variant number of the nested folders in the folder hierarchy. Pretty confusing structure for someone new, to be honest.

So trying to find some sort of folder/file pattern did not sound like a viable option to me. Luckily, the raw data file names issued by 3rd party suppliers were consistent in one used naming convention and therefore using a regular expression and PowerShell was my best bet to start with first.

The objective in this exercise was to remove the PAF2_V postfix from the file name of this naming convention PAF2_V<version of data extract>_<data context type>.DAT.

In an example, this file name:

PAF2_V2022Q1V01_ABBREVIATIONS.DAT

must become:

ABBREVIATIONS.DAT

This is an example of the fragment of the file names and hierarchy:

Root |
     |- PAF2_V2022Q1V01_ABBREVIATIONS.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_STREET_NAMES.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_SUBURB_NAMES.DAT
     |- PAF2_V2022Q1V01_ALTERNATIVE_TOWN_CITY_NAMES.DAT
     |- 2021_Q3 |
                |- PAF2_V2021Q3V01_ABBREVIATIONS.DAT
                |- PAF2_V2021Q3V01_ALTERNATIVE_STREET_NAMES.DAT
...

Surprisingly, the PowerShell command was very easy to build up!

Get-ChildItem -Filter "PAF2_*.DAT" -Recurse | Rename-Item -NewName {$_.Name -replace '^(?:[^_]+_){2}',''}

The breakdown structure of the PowerShell command:

  1. Get-ChildItem -Filter “PAF2_*.DAT” – the command gets and filters out all files not having PAF2_ prefix and .DAT extension from the list
  2. -Recurse – parameter used here for recursive type of search over the folder/file hierarchy
  3. Rename-Item -NewName {$_.Name -replace – the command renames all files by replacing found substring in the file name by some another
  4. ^(?:[^_]+_){2} – regular expression to find the substring in the string used for file name (to be replaced by empty string [”] eventially). As you can see, the substring search expression counts number of separator character ‘_’ in string. The match become after the second occurence.

Hope you enjoyed reading this post and found it helpful, too!

/cheers.

How to find out PowerShell version quickly

Every Developer/DevOps was looking for this command at least ones a year (including myself). Truth is that we all like PowerShell and sometimes tend to forget put a running engine pre-conditional check inside a batch script (as whatever script you’re producing, always make sure that it’s transfarable onto another environment) and referencing some function which is not in (default) installed version.

Anyway…take this post as a reminder reference quide.

3 ways of how to quickly find out what version of PowerShell having installed

1. $PSVersionTable.PSVersion

This is my prefered way over the others. Why? Because it works on local as well as on remote station.

2. (Get-Host).Version

3. $host.Version

Summing this up

That’s it. Hope you like this short reminder and let me know your preferred way of doing this. Cheers!