1

Im using powershell or cmd to open a text file and get a specific value or text inside.

Currently the output looks like this inside the text file:

Upload Hello.zip: 0 of 933 bytes complete

Upload Hello.zip: 933 of 933 bytes complete
{
  "scanId" : 11260434,
  "scanType" : "Static",
  "analysisStatusType" : "Pending",
  "applicationName" : "Test Application Jenkins",
  "releaseName" : "Release 4",
  "microserviceName" : "",
  "__action__" : "STARTED"
}

However, I only would like to get the value from the scan ID.

This is the command I use:

Select-String -Path C:\Folder\scanjson.txt ':\s*(?<digits>[0-9]+)'

But It would return the output like this:

C:\Folder\scanjson.txt:2:Upload Hello.zip: 0 of 933 bytes complete
C:\Folder\scanjson.txt:4:Upload Hello.zip: 933 of 933 bytes complete
C:\Folder\scanjson.txt:6:  "scanId" : 11260434,
1
  • Try "scanId"\s*:\s*(?<digits>[0-9]+)? Commented Jun 25, 2024 at 18:42

1 Answer 1

5

Since your text file is mostly JSON, it is better and more robust to remove the non-JSON preamble and use ConvertFrom-Json to parse the JSON into an object whose properties you can access to get the desired information:

# -> 11260434, i.e. a *number*, due to from-JSON parsing.
(
  (Get-Content -Raw C:\Folder\scanjson.txt) -replace '(?s)^.+(?=\{)' |
    ConvertFrom-Json
).scanId
  • Get-Content -Raw reads the file as a whole into a single, multiline string.

  • -replace '(?s)^.+(?=\{)' removes everything up to, but excluding, the first { character; the components of the regex are as follows:

    • (?s) is an inline option that makes . match newline characters too.
    • ^ matches at the start of the string.
    • .+ matches any nonempty (+) run of characters (.); if your input file doesn't always have a preamble, use .* instead.
    • (?=\{) is a lookahead assertion that matches a { character without including it in the match.
    • By not specifying a substitution expression, the match is replaced with the empty string, i.e. effectively removed.
  • The resulting string is valid JSON that ConvertFrom-Json parses into [pscustomobject] instances, whose properties, such as .scanId, you can access.

    • Note that, unlike the strictly text-based regex-parsing, JSON supports a few data types, causing an unquoted token such as 11260434 to be parsed as a number.
    • An integer - such as in the case at hand - is parsed as type [int] (System.Int32) in Windows PowerShell vs. as [long] (System.Int64) in PowerShell (Core) 7.

As for what you tried:

Your regex was too permissive; use
"scanId"\s*:\s*(?<digits>[0-9]+) instead;[1] i.e.:

# -> '11260434', i.e. a *string*, due to regex parsing.
(
  Select-String -List -Path C:\Folder\scanjson.txt '"scanId"\s*:\s*(?<digits>[0-9]+)'
).Matches[0].Groups['digits'].Value

Note the need to drill down into the Select-String output object, which is of type [Microsoft.PowerShell.Commands.MatchInfo], to get the capture-group value of interest.

-List ensures that matching stops after the first match.

  • Note: In cases where you expect multiple matches, you'd omit -List and pipe to a ForEach-Object call instead of using direct property access on the result (as shown above):

    Select-String -Path C:\Folder\scanjson.txt '"scanId"\s*:\s*(?<digits>[0-9]+)'| 
      ForEach-Object { $_.Matches[0].Groups['digits'].Value }
    

Due to using regex-based parsing, the result is a string, simply cast the entire expression to, e.g. [int] to get a number.


[1] For an explanation of the regex and the option to experiment with it, see this regex101.com page. Note that regex101.com's .NET support is limited to C#, which may require tweaks to PowerShell regexes, such as not using '...' and escaping " chars. as ""; see this answer for guidance.

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks! The main comment/answer worked for me. Aso for the one I tried, I was testing it on this website: regex101.com/r/IgaNue/1
Glad to hear it, @George; the Select-String solution should work too, though it returns a string (that you can easily cast to [int]). By contrast, the ConvertFrom-Json solution returns a number (of type [int] in Windows PowerShell vs. [long] in PowerShell (Core) 7)
I was using this before: Select-String -Path C:\Folder\scanjson.txt '"scanId":\s*(?<digits>[0-9]+)' -List | ForEach-Object { $_.Matches[0].Groups['digits'].Value }
@George: That works in principle, but the regex is broken; this works: Select-String -Path C:\Folder\scanjson.txt '"scanId"\s*:\s*(?<digits>[0-9]+)' -List | ForEach-Object { $_.Matches[0].Groups['digits'].Value }. Since only one result is expected, my solution makes do without a ForEach-Object call.
My bad, I can confirm that the Select-String method is also now working. I could also try this command directly without having to save the data into a text file. thanks again!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.