I need to process a huge text blob (~30000 lines) which will take quite some time with the user looking on. I can use a progressbar to provide some feedback to the user while this is happening... but in order to do that I need to know how many lines are in the text blob so I can update the progressbar as lines are processed.
The problem is, I need that line count really fast... or the user will think something is broken or locked up.
The text blob will be available in a dialog memo field. My first attempt to count the lines involved writing the blob to a file and then reading it back line by line and counting... but that was too slow.
So I tried three other ways of counting the lines without writing to a file:
- Separate>
- RegEx with an EasyPattern
- plain Regex
10 points to anyone (except Marcus) who can pull this off.
Rules are:
//=========================================
The only code you can change is the code between
these two lines in the posted code below
//=========================================
- Having a faster PC than mine is not a solution
- Compiling the code below is not a solution
Maybe there's a Win32 API call that can beat the RegEx solution?
Marcus, if you have a way, please hold off for a few days... at least until the end of the month.
Thanks all and take care
Code: Select all
/*
Given a very large text blob entered into a dialog memo field
and each "line" is separated by CRLF... what is the absolute
fastest way to determine how many lines it contains?
Performance Data from my home PC:
Method: Separate> command
Elapsed time in seconds:94.11474609375
Lines: 30000
Method: RegEx> with EasyPattern
Elapsed time in seconds: 4.7568359375
Lines: 30000
Method: plain RegEx>
Elapsed time in seconds: 4.7177734375
Lines: 30000
Plain RegEx is the fastest so far... but can you beat it?
*/
VBSTART
VBEND
Dialog>MyDialog
Caption=Text Blob Line Counter Speed Challenge
Width=324
Height=321
Top=133
Left=109
Max=0
Min=0
Close=1
Resize=1
Memo=msMemo1,10,33,295,200, Still Initializing - Please be patient... 30000 lines will appear here soon
Label=Memo field below contains 30000 lines,8,8,true
Button=Count the Lines and Time how long it took,10,247,223,25,3
Button=Exit,248,247,57,25,2
EndDialog>MyDialog
//Initialize a 30000 line text blob
Let>blob_300_lines=
Let>blob_30000_lines=
//build blob_300_lines
Let>line_num=0
Repeat>line_num
Let>line_num=line_num+1
ConCat>blob_300_lines,text line%SPACE%%line_num%%CRLF%
Until>line_num=300
//build blob_30000_lines
Let>line_num=0
Repeat>line_num
Let>line_num=line_num+1
ConCat>blob_30000_lines,blob_300_lines
Message>Initializing... we'll be done at 100: %line_num%
Until>line_num=100
//Close Message> box
Press Enter
Show>MyDialog
Let>MyDialog.msMemo1=%blob_30000_lines%
ResetDialogAction>MyDialog
Label>ActionLoop
GetDialogAction>MyDialog,result
If>result=2,End
If>result=3,Go
Goto>ActionLoop
Label>Go
VBEval>Timer,startSeconds
//Method: Separate> command
//Separate>%MyDialog.msMemo1%,%CRLF%,returnvar
//Method: RegEx> with EasyPattern
//RegEx>[CRLF],MyDialog.msMemo1,1,matches_array,num_matches,0
//=========================================
//Method: plain RegEx>
RegEx>\r\n,MyDialog.msMemo1,0,matches_array,num_matches,0
//=========================================
VBEval>Timer-%startSeconds%,elapsedSeconds
//MDL>Method: Separate> command%CRLF%%CRLF%Elapsed time in seconds:%elapsedSeconds%%CRLF%%CRLF%Lines: %returnvar_count%
//MDL>Method: RegEx> with EasyPattern%CRLF%%CRLF%Elapsed time in seconds:%elapsedSeconds%%CRLF%%CRLF%Lines: %num_matches%
MDL>Method: plain RegEx>%CRLF%%CRLF%Lines Counted: %num_matches%%CRLF%Elapsed time in seconds:%elapsedSeconds%%CRLF%%CRLF%Can you make it count them any faster?
Label>End