1

I have a working script in PowerShell:

$file = Get-Content -Path HKEY_USERS.txt -Raw

foreach($line in [System.IO.File]::ReadLines("EXCLUDE_HKEY_USERS.txt"))
{
    $escapedLine = [Regex]::Escape($line)
    $pattern = $("(?sm)^$escapedLine.*?(?=^\[HKEY)")
    
    $file -replace $pattern, ' ' | Set-Content HKEY_USERS-filtered.txt
    $file = Get-Content -Path HKEY_USERS-filtered.txt -Raw
}

For each line in EXCLUDE_HKEY_USERS.txt it is performing some changes in file HKEY_USERS.txt. So with every loop iteration it is writing to this file and re-reading the same file to pull the changes. However, Get-Content is notorious for memory leaks, so I wanted to refactor it to StreamReader and StreamWriter, but I'm a having a hard time to make it work.

As soon as I do:

$filePath = 'HKEY_USERS-filtered.txt';
$sr = New-Object IO.StreamReader($filePath);
$sw = New-Object IO.StreamWriter($filePath);

I get:

New-Object : Exception calling ".ctor" with "1" argument(s): "The process cannot access the file 
'HKEY_USERS-filtered.txt' because it is being used by another process."

So it looks like I cannot use StreamReader and StreamWriter on same file simultaneously. Or can I?

4
  • 2
    Note that [System.IO.File]::ReadLines( ) is the way to go in this case. Get-Content without the -Raw switch doesn't lead to memory leaks it just adds, honestly uneeded, ETS properties to each line (each object of the file) that's why it's slow. Commented Mar 20, 2022 at 1:54
  • 1
    Adding to my previous comment, there is no performance difference between [System.IO.File]::ReadAllText( ) and Get-Content -Raw Commented Mar 20, 2022 at 2:07
  • 1
    As an aside: It's best to pseudo method syntax: Instead of New-Object SomeType(arg1, ...), use New-Object SomeType [-ArgumentList] arg1, ... - PowerShell cmdlets, scripts and functions are invoked like shell commands, not like methods. That is, no parentheses around the argument list, and whitespace-separated arguments (, constructs an array as a single argument, as needed for -ArgumentList). However, method syntax is required if you use the PSv5+ [SomeType]::new() constructor-call method. See this answer Commented Mar 20, 2022 at 2:32
  • @SantiagoSquarzon ReadLines indeed consumes much less memory than Get-Content, but it has a problem with regexes. Commented Mar 20, 2022 at 7:50

1 Answer 1

3

tl;dr

  • Get-Content -Raw reads a file as a whole and is fast and consumes little unwanted memory.

  • [System.IO.File]::ReadLines() is a faster and more memory-efficient alternative to line-by-line reading with Get-Content (without -Raw), but you need to ensure that the input file is passed as a full path, because .NET's working directory usually differs from PowerShell's.

    • Convert-Path resolves a given relative path to a full, file-system-native one.

    • A PowerShell-native alternative to using [System.IO.File]::ReadLines() is the switch statement with the -File parameter, which performs similarly well while avoiding the working-directory discrepancy pitfall, and offers additional features.

  • There is no need to save the modified file content to disk after each iteration - just update the $file variable, and, after exiting the loop, save the value of $file to the output file.

$fileContent = Get-Content -Path HKEY_USERS.txt -Raw

# Be sure to specify a *full* path.
$excludeFile = Convert-Path -LiteralPath 'EXCLUDE_HKEY_USERS.txt'

foreach($line in [System.IO.File]::ReadLines($excludeFile)) {
    $escapedLine = [Regex]::Escape($line)
    $pattern = "(?sm)^$escapedLine.*?(?=^\[HKEY)"
    # Modify the content and save the result back to variable $fileContent
    $fileContent = $fileContent -replace $pattern, ' '
}

# After all modifications have been performed, save to the output file
$fileContent | Set-Content HKEY_USERS-filtered.txt

Building on Santiago Squarzon's helpful comments:

  • Get-Content does not cause memory leaks, but it can consume a lot of memory that isn't garbage-collected until an unpredictable later point in time.
    • The reason is that - unless the -Raw switch is used - it decorates each line read with PowerShell ETS (Extended Type System) properties containing metadata about the file of origin, such as its path (.PSPath) and the line number (.ReadCount).
    • This both consumes extra memory and slows the command down - GitHub issue #7537 asks for a way to opt out of this wasteful decoration, as it typically isn't needed.
    • However, reading with -Raw is efficient, because the entire file content is read into a single, multi-line string, which means that the decoration is only performed once.

So it looks like I cannot use StreamReader and StreamWriter on same file simultaneously. Or can I?

No, you cannot. You cannot simultaneously read from a file and overwrite it.

To update / replace an existing file you have two options (note that, for a fully robust solution, all attributes of the original file (except the last write time and size) should be retained, which requires extra work):

  • Read the old content into memory in full, perform the desired modification in memory, then write the modified content back to the original file, as shown in the top section.

    • There is a slight risk of data loss, however, namely if the process of writing back to the file gets interrupted.
  • More safely, write the modified content to a temporary file and, upon successful completion, replace the original file with the temporary one.

Sign up to request clarification or add additional context in comments.

2 Comments

For me Get-Content wastes 4+ GB of RAM (even with your code optimization), whereas [System.IO.File]::ReadLines at peak took only 1.5 GB RAM and memory was freed from PowerShell ISE process once the script stopped, which wasn't the case with Get-Content.
@van_folmert, you can't compare the two because Get-Content -Raw reads the entire file into a single string as [System.IO.File]::ReadAllText() would - which you need, since you want to match across line boundaries. The memory isn't reclaimed until some time after the $fileContent variable goes out of scope or is manually removed. By contrast, [System.IO.File]::ReadLines() reads line by line, lazily (as Get-Content without -Raw does _in the pipeline) with much more overhead), and in a loop the strings created may become eligible for garbage collection after each iteration.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.