OK, Powershell may not be the best tool for the job but it's the only one available to me.
I have a bunch of 600K+ row .csv data files. Some of them have delimiter errors e.g. " in the middle of a text field or "" at the start of one. They are too big to edit (even in UltraEdit) and fix manually even if I wanted to which I don't!
Because the double-""-delimeter at the start of some text fields and rogue-"-delimiter in the middle of some text fields, I haven't used a header row to define the columns because these rows appear as if there is an extra column in them due to the extra delimiter.
I need to parse the file looking for "" instead of " at the start of a text-field and also to look for " in the middle of a text field and remove them.
I have managed to write the code to do this (after a fashion) by basically reading the whole file into an array, looping through it and adding output characters to an output array.
What I haven't managed to do is successfully write this output array to a file.
I have read every part of https://learn.microsoft.com/en-us/powershell/module/Microsoft.PowerShell.Utility/out-file?view=powershell-5.1 that seemed relevant. I've also trawled through about 10 similar questions on this site and attempted various code gleaned from them.
The output array prints perfectly to screen using a Write-Host but I can't get the data back into a file for love or money. I have a total of 1.5days Powershell experience so far! All suggestions gratefully received.
Here is my code to read/identify rogue delimiters (not pretty (at all), refer previous explanation of data and available technology constraints):
$ContentToCheck=get-content 'myfile.csv' | foreach { $_.ToCharArray()}
$ContentOutputArray=@()
for ($i = 0; $i -lt $ContentToCheck.count; $i++)
{
if (!($ContentToCheck[$i] -match '"')) {#not a quote
if (!($ContentToCheck[$i] -match ',')) {#not a comma i.e. other char that could be enclosed in ""
if ($ContentToCheck[$i-1] -match '"' ) {#check not rogue " delimiter in previous char allow for start of file exception i>1?
if (!($ContentToCheck[$i-2] -match ',') -and !($ContentToCheck[$i-3] -match '"')){
Write-Host 'Delimiter error' $i
$ContentOutputArray+= ''
}#endif not preceded by ",
}#endif"
else{#previous char not a " so move on
$ContentOutputArray+= $ContentToCheck[$i]
}
}#endifnotacomma
else
{#a comma, include it
$ContentOutputArray+= $ContentToCheck[$i]
}#endacomma
}#endifnotaquote
else
{#a quote so just append it to the output array
$ContentOutputArray+= $ContentToCheck[$i]
}#endaquote
}#endfor
So far so good, if inelegant. if I do a simple
Write-Host $ContentOutputArray
data displays nicely " 6 5 " , " 652 | | 999 " , " 99 " , " " , " 678 | | 1 " ..... furthermore when I check the size of the array (based on a cut-down version of one of the problem files)
$ContentOutputArray.count
I get 2507 character length of array. Happy out. However, then variously using:
$ContentOutputArray | Set-Content 'myfile_FIXED.csv'
creates blank file
$ContentOutputArray | out-file 'myfile_FIXED.csv' -encoding ASCII
creates blank file
$ContentOutputArray | export-csv 'myfile_FIXED.csv'
gives only '#TYPE System.Char' in file
$ContentOutputArray | Export-Csv 'myfile_FIXED.csv' -NoType
gives empty file
$ContentOutputArray >> 'myfile_FIXED.csv'
gives blanks separated by ,
What else can I try to write an array of characters to a flat file? It seems such a basic question but it has me stumped. Thanks for reading.