4

I'm trying to write a Powershell script that will pull out a string between two HTML tags within an HTML file. I don't know what the value will be, but I know what tags need to be searched. Additionally, I know that the tags do not always appear at the start of a line (i.e., they can be in the middle of a line of text). Finally, I also know that the tags and the string between them will never break across a line.

I have the path of the file stored in a variable

$filePath = "C:\Path\file.html"

I'm trying to find any value between <h6> and </h6> and store those values in an array.

1 Answer 1

2

Try

$myarray = gc $filepath | 
% { [regex]::matches( $_ , '(?<=<h6>\s+)(.*?)(?=\s+</h6>)' ) } | 
select -expa value

This remove starting and trailing spaces if any. If you need also this spaces remove \s+ from the regex pattern

Sign up to request clarification or add additional context in comments.

4 Comments

Perfect! After posting this I played more with my regex and I almost had it like you have. One last question: What does the select -expa value do?
@EustaceMonk You can try the command without pipe to select -expa value and then try with pipe to select value to understand the difference. testing it is better than my english ;)
I don't see a difference between using select -expa value and leaving that completely off. I do see the difference when using just select value.
@EustaceMonk Without pipe to select -expa value retuns all matchinfo. pipe to select value return an array of pscustomobject. pipe to select -expa value return an array of string

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.