1

OK, Powershell may not be the best tool for the job but it's the only one available to me.

I have a bunch of 600K+ row .csv data files. Some of them have delimiter errors e.g. " in the middle of a text field or "" at the start of one. They are too big to edit (even in UltraEdit) and fix manually even if I wanted to which I don't!

Because the double-""-delimeter at the start of some text fields and rogue-"-delimiter in the middle of some text fields, I haven't used a header row to define the columns because these rows appear as if there is an extra column in them due to the extra delimiter.

I need to parse the file looking for "" instead of " at the start of a text-field and also to look for " in the middle of a text field and remove them.

I have managed to write the code to do this (after a fashion) by basically reading the whole file into an array, looping through it and adding output characters to an output array.

What I haven't managed to do is successfully write this output array to a file.

I have read every part of https://learn.microsoft.com/en-us/powershell/module/Microsoft.PowerShell.Utility/out-file?view=powershell-5.1 that seemed relevant. I've also trawled through about 10 similar questions on this site and attempted various code gleaned from them.

The output array prints perfectly to screen using a Write-Host but I can't get the data back into a file for love or money. I have a total of 1.5days Powershell experience so far! All suggestions gratefully received.

Here is my code to read/identify rogue delimiters (not pretty (at all), refer previous explanation of data and available technology constraints):

$ContentToCheck=get-content 'myfile.csv' | foreach { $_.ToCharArray()}
$ContentOutputArray=@()

for ($i = 0; $i -lt $ContentToCheck.count; $i++)
{
    if (!($ContentToCheck[$i] -match '"')) {#not a quote

    if (!($ContentToCheck[$i] -match ',')) {#not a comma i.e. other char that could be enclosed in ""


        if ($ContentToCheck[$i-1] -match '"' ) {#check not rogue " delimiter in previous char allow for start of file exception i>1?


            if (!($ContentToCheck[$i-2] -match ',') -and !($ContentToCheck[$i-3] -match '"')){
                Write-Host 'Delimiter error' $i 
                $ContentOutputArray+= ''

            }#endif not preceded by ",


        }#endif"

        else{#previous char not a " so move on

            $ContentOutputArray+= $ContentToCheck[$i]

        }

    }#endifnotacomma

    else
    {#a comma, include it

        $ContentOutputArray+= $ContentToCheck[$i]       
    }#endacomma

}#endifnotaquote

else
{#a quote so just append it to the output array

    $ContentOutputArray+= $ContentToCheck[$i]

}#endaquote

}#endfor 

So far so good, if inelegant. if I do a simple

Write-Host $ContentOutputArray 

data displays nicely " 6 5 " , " 652 | | 999 " , " 99 " , " " , " 678 | | 1 " ..... furthermore when I check the size of the array (based on a cut-down version of one of the problem files)

   $ContentOutputArray.count

I get 2507 character length of array. Happy out. However, then variously using:

   $ContentOutputArray | Set-Content 'myfile_FIXED.csv' 

creates blank file

   $ContentOutputArray | out-file 'myfile_FIXED.csv' -encoding ASCII 

creates blank file

   $ContentOutputArray | export-csv 'myfile_FIXED.csv' 

gives only '#TYPE System.Char' in file

   $ContentOutputArray | Export-Csv 'myfile_FIXED.csv' -NoType 

gives empty file

   $ContentOutputArray >> 'myfile_FIXED.csv' 

gives blanks separated by ,

What else can I try to write an array of characters to a flat file? It seems such a basic question but it has me stumped. Thanks for reading.

2
  • 1
    Which version of powershell are you using? Commented Sep 13, 2017 at 15:45
  • Version 2.0 Matthias Commented Sep 13, 2017 at 15:53

1 Answer 1

4

Convert (or cast) the char array to a string before exporting it.

(New-Object string (,$ContentOutputArray)) |Set-Content myfile_FIXED.csv
Sign up to request clarification or add additional context in comments.

1 Comment

Hi, that threw an error so I modified it to [string]$ContentOutputArray |Set-Content 'myfile_FIXED.csv' and it worked a charm. Thanks so much.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.