Hi; I have a MS script that dumps contents from an Oracle database via sqlplus and for some reason it puts a funky character at the start of the file that make the contents unreadable.
-I042,I042:I(A),AMPS,IA,4,07/21/2013 06:00 AM,88.812897
I042,I042:I(B),AMPS,IB,4,07/21/2013 06:00 AM,75.329751
(The - really is an extended ASCII code 254 or the likes of it)
I subsequently extended the script to read & strip out this char and write to a new file - this works if the file is small but as the data grows in size it's not efficient!
How can I just delete/replace this ONE char without all the read/write overhead?
Thanks
Removing the first character in a very large file
Moderators: Dorian (MJT support), JRL
-
- Newbie
- Posts: 15
- Joined: Wed Dec 19, 2007 9:16 pm
Have you tried Windows Powershell? I've tested the script below with 100k and 1 Million lines:
100K lines took roughly 18 seconds.
1 Million lines took roughly 174 seconds.
Maybe someone has a faster solution.
100K lines took roughly 18 seconds.
1 Million lines took roughly 174 seconds.
Code: Select all
Timer>StartTimer
Let>InputFile=%DESKTOP_DIR%\temp.txt
Let>OutpuFile=%DESKTOP_DIR%\out.txt
Let>RP_WINDOWMODE=0
Let>RP_WAIT=1
Run>powershell.exe Get-Content %InputFile% | ForEach-Object {$_ -replace '-', ''} | Set-Content %OutpuFile%
Timer>EndTimer
Let>SecElapsed={(%EndTimer%-%StartTimer%)/1000}
mdl>SecElapsed
Maybe someone has a faster solution.
-
- Newbie
- Posts: 15
- Joined: Wed Dec 19, 2007 9:16 pm
This took 52 seconds on a million plus (1075076) line file. Could be better or just a computer difference. I can't run Rain's because I can't find powershell.exe.
It reads the first line of the file, uses midstr> to remove the first character, writes that line to a new output file. then uses DOS "type | find" to write the rest of the input file to the output file.
It reads the first line of the file, uses midstr> to remove the first character, writes that line to a new output file. then uses DOS "type | find" to write the rest of the input file to the output file.
Code: Select all
Timer>StartTimer
Let>InputFile=%DESKTOP_DIR%\temp.txt
Let>OutputFile=%DESKTOP_DIR%\out.txt
ReadLn>InputFile,1,res
MidStr>res,2,999999,res
WriteLn>OutputFile,wres,res
Let>RP_Windowmode=0
Let>RP_Wait=1
RunProgram>cmd /c type "%InputFile%" | find /v "%res%" >> "%OutputFile%"
Timer>EndTimer
Let>SecElapsed={(%EndTimer%-%StartTimer%)/1000}
mdl>SecElapsed
-
- Newbie
- Posts: 15
- Joined: Wed Dec 19, 2007 9:16 pm
Hi, I was curious to see if one could use RegEx to solve it. Not sure if there are any upper limits when the file gets much larger but for one million lines it will complete it in around 1 second.
Code: Select all
Let>InputFile=C:\Users\Christer\Documents\testfile.txt
Let>OutputFile=C:\Users\Christer\Documents\resfile.txt
Timer>StartTimer
ReadFile>InputFile,strInput
RegEx>(?s)(?<=-).+,strInput,0,Matches,NumMatches,0,,
WriteLn>OutputFile,nWLNRes,Matches_1
Timer>EndTimer
Let>SecElapsed={(%EndTimer%-%StartTimer%)/1000}
mdl>SecElapsed