1

I have a PowerShell script that imports a CSV file, filters out rows from two columns and then concatenates a string and exports to a new CSV file.

Import-Csv "redirect_and_canonical_chains.csv" |
Where { $_."Number of Redirects" -gt 1} |
Select {"Redirect 301 ",$_.Address, $_."Final Address"} |
Export-Csv "testing-export.csv" –NoTypeInformation 

This all works fine however for the $_.Address value I want to strip the domain, sub-domain and protocol etc using the following regex

^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/\n]+)

This individually works and matches as I want but I am not sure of the best way to implement when selecting the data (should I use $match, -replace etc) or whether I should do it after importing?

Any advice greatly appreciated!

Many thanks

Mike

2
  • 1
    You can use a calculated property in your select to do this. Commented Feb 26, 2019 at 16:18
  • Hi James, many thanks for the reply and link. Would you be able to give me an example of how I would implement this in my script above for the scenario described. I'm not quite clear on the syntax to use from the linked page. Thanks again! Commented Feb 26, 2019 at 16:37

3 Answers 3

1

The best place to do it would be in the select clause, as in:

select Property1,Property2,@{name='NewProperty';expression={$_.Property3 -replace '<regex>',''}}

That's what a calculated property is: you give the name, and the way to create it.Your regex might need revision to work with PowerShell, though.

Sign up to request clarification or add additional context in comments.

5 Comments

Ah ok, thanks! Is there are any problem with doing the replace in the way I have done. It appears to have the expected result in the CSV output
If it works, keep it, I don't foresee any problem. That being said, select normally takes a list of properties, and you feed it a scriptblock. Honestly, I don't even understand why it works! :D
Hmm, it works ok with static values but the regex isn't picking up matches (although it works in regexr). Is there anything in particular that I need to watch out for with using regex in PowerShell?
Did you try the PCRE flavour? That's much closer to .Net implementation than javascript (I use Regex101 myself, but PCRE is PCRE).
So, I tried, and it seems to be that your domain name is included in the regex. So either you have to get it in the replace string: -replace '^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/\n]+)','$1', or simply remove it from the regex: -replace '^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?',''
0

I've realized now that I can just use .Replace in the following way :)

Select {"Redirect 301 ",$_.Address.Replace('http://', 'testing'), $_."Final Address"}

6 Comments

This creates objects with a single property literally named "Redirect 301 ",$_.Address.Replace('http://', 'testing'), $_."Final Address" whose value is a 3-element array. Given that this won't meaningfully serialize to a CSV file with Export-Csv, I'm assuming it's not what you actually need.
Thanks for the comment. I just need to generate a single string for each for each row which seems to what's happening so that is ok. Each row should contain something like 301 redirect /oldurl.html mydomain.com/new-url.html The purpose of the regex rule is to strip out the domain part. If I test the rule on a static string then it works fine but it doesn't work on the select statement Select {"Redirect 301 ",($_.Address).Replace($regEx,''), $_.'Final Address'}|
.Replace() performs literal substring replacement - it doesn't with regexes. If you just want to return a string, don't use Select - use ForEach-Object (%) with a script block that outputs the desired string. You're looking for something like ... | % { '{0} {1} {2}' -f 'Redirect 301', ($_.Address -replace '^http://[^/]+'), $_.'Final Address' }
@mklement0 Thanks again. I have changed to using ForEach-Object but still the same result. Maybe I am not understanding correctly. Could you tell me what is wrong here? ForEach-Object{ Write-Host "Redirect 301 ", ($_.Address -replace '^(?:https?:\/\/)?(?:[^@\/\n]+@)?(?:www\.)?([^:\/\n]+)}', ''), $_.'Final Address'}
Just realised I had a } at the end in the wrong place and when removed it works! thanks so much again @mklement0
|
0

Based on follow-up comments, the intent behind your Select[-Object] call was to create a single string with space-separated entries from each input object.

Note that use of Export-Csv then makes no sense, because it will create a single Length column with the input strings' length rather than output the strings themselves.

In a follow-up comment you posted a solution that used Write-Host to produce the output string, but Write-Host is generally the wrong tool to use, unless the intent is explicitly to write to the display only, thereby bypassing PowerShell's output streams and thus the ability to send the output to other commands, capture it in a variable or redirect it to a file.

Here's a fixed version of your command, which uses the -join operator to join the elements of a string array to output a single, space-separated string:

$sampleCsvInput = [pscustomobject] @{ 
  Address = 'http://www.example.org/more/stuff'; 
  'Final Address' = 'more/stuff2'
}

$sampleCsvInput | ForEach-Object { 
  "Redirect 301 ", 
  ($_.Address -replace '^(?:https?://)?(?:[^@/\n]+@)?(?:www\.)?([^:/\n]+)', ''), 
  $_.'Final Address' -join ' ' 
}

Note that , - PowerShell's array-construction operator - has higher precedence than the -join operator, so the -join operation indeed joins all 3 preceding array elements.

The above yields the following string:

Redirect 301  /more/stuff more/stuff2

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.