3

So I'm trying to count the words of my text file however when I do get-content the array reads them letter by letter and so it doesn't let me compare them word by word. I hope you guys can help me out!

Clear-Host #Functions

function Get-Articles (){

 foreach($Word in $poem){
    if($Articles -contains $Word){
       $Counter++
    }
}
    write-host "The number of Articles in your sentence: $counter"
}

#Variables

$Counter = 0

$poem = $line
$Articles = "a","an","the"

#Logic

$fileExists = Test-Path "text.txt"

if($fileExists) {
    $poem = Get-Content "text.txt"
    }
else
    {
    Write-Output "The file SamMcGee does not exist"  
    exit(0) 
    }

$poem.Split(" ")

Get-Articles

2 Answers 2

4

What your script does, edited down a bit:

$poem = $line                    # set poem to $null (because $line is undefined)
$Articles = "a","an","the"       # $Articles is an array of strings, ok

                                 # check file exists (I skipped, it's fine)

$poem = Get-Content "text.txt"   # Load content into $poem, 
                                 # also an array of strings, ok

$poem.Split(" ")                 # Apply .Split(" ") to the array.
                                 # Powershell does that once for each line.
                                 # You don't save it with $xyz = 
                                 # so it outputs the words onto the 
                                 # pipeline.
                                 # You see them, but they are thrown away.

Get-Articles                     # Call a function (with no parameters)


function Get-Articles (){        

                                 # Poem wasn't passed in as a parameter, so
 foreach($Word in $poem){        # Pull poem out of the parent scope. 
                                 # Still the original array of lines. unchanged.
                                 # $word will then be _a whole line_.

    if($Articles -contains $Word){    # $articles will never contain a whole line
       $Counter++
    }
}
    write-host "The number of Articles in your sentence: $counter"  # 0 everytime
}

You probably wanted to do $poem = $poem.Split(" ") to make it an array of words instead of lines.

Or you could have passed $poem words into the function with

function Get-Articles ($poem) {
...

Get-Articles $poem.Split(" ")

And you could make use of the PowerShell pipeline with:

$Articles = "a","an","the"

$poemArticles = (Get-Content "text.txt").Split(" ") | Where {$_ -in $Articles}
$counter = $poemArticles | Measure | Select -Expand Count
write-host "The number of Articles in your sentence: $counter"
Sign up to request clarification or add additional context in comments.

1 Comment

Scriptblock in Where will be executed on each word, and since scriptblock invocation overhead is huge in PowerShell coupled with the slowness of pipelining in PS, this is the slowest solution by far. The fastest one is mentioned in a comment: (Select-String '\b(a|an|the)\b' text.txt -AllMatches).Matches.Count. The original code in the question is almost as fast provided it's fixed by using split on each line or on the entire text content string.
1

TessellatingHeckler's helpful answer explains the problem with your approach well.

Here's a radically simplified version of your command:

$counter = (-split (Get-Content -Raw text.txt) -match '^(a|an|the)$').count
write-host "The number of articles in your sentence: $counter"

The unary form of the -split operator is key here: it splits the input into words by any run of whitespace between words, resulting in an array of individual words.

-match then matches the resulting array of words against a regex that matches words a, an, or the: ^(a|an|the)$.

The result is the filtered subarray of the input array containing only the words of interest, and .count simply returns that subarray's count.

4 Comments

You'd think Select-String would be shorter than -split and get-content and match combined, eh? but no (Select-String '\b(a|an|the)\b' text.txt -AllMatches).Matches.Count. And (Get-Content -Raw text.txt) -replace '.*?\b(a|an|the)\b.*?'|measure -word).Words is also not shorter. :-/
@TessellatingHeckler: It would be interesting to see how your variations compare in terms of performance, however.
Impromptu testing, I just happen to have saved the PoSh help locally earlier, 1.4MB of text. Changing the file selector to *.txt, your approach takes 0.5s and finds 20,409 articles, my select-string takes 0.35s and finds 20,953 articles, and my -replace takes 5.8s and finds 83,712. Probably discount that last one. But my word boundary regex is possibly finding things like "the which your space split would miss.
@TessellatingHeckler: Good to know, and good point re "the, thanks. Select-String is powerful, but you don't see it used that often.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.