Using RegEx to search text file and create variable array
Moderators: Dorian (MJT support), JRL
Using RegEx to search text file and create variable array
Hi, I have what seems to be a simple requirement, but I'm beyond my knowledge in how to script it.
Here's what I'm trying to do:
1. Read all contents of a text file and put it into variable "xmlText"
2. Perform a search within variable "xmlText" using regular expression pattern to filter out only select URLs and put them into an array "urlText" (I already have the regular expression written but don't know how to script the search in Macro Scheduler).
Note: There will by anywhere from 1 to 5 URLs within the "xmlText" variable. Never less than 1 and never more than 5. This is why I was thinking I need to put them into a dynamic array - but again, I'm a beginner so I don't really know how to script this action
3. Next, I need to have the "xmlText" variable written into a text file called URLS.txt with each variable on a separate line (so they don't all run together).
4. Finally, I need to copy URLS.txt into a specific directory and overwrite an existing file of the same name.
I have tried to state this requirement as clear as possible, please let me know if you have more specific questions..
I'd appreciate any help from anybody... Thanks in Advance!
[/list]
Here's what I'm trying to do:
1. Read all contents of a text file and put it into variable "xmlText"
2. Perform a search within variable "xmlText" using regular expression pattern to filter out only select URLs and put them into an array "urlText" (I already have the regular expression written but don't know how to script the search in Macro Scheduler).
Note: There will by anywhere from 1 to 5 URLs within the "xmlText" variable. Never less than 1 and never more than 5. This is why I was thinking I need to put them into a dynamic array - but again, I'm a beginner so I don't really know how to script this action
3. Next, I need to have the "xmlText" variable written into a text file called URLS.txt with each variable on a separate line (so they don't all run together).
4. Finally, I need to copy URLS.txt into a specific directory and overwrite an existing file of the same name.
I have tried to state this requirement as clear as possible, please let me know if you have more specific questions..
I'd appreciate any help from anybody... Thanks in Advance!
[/list]
- Marcus Tettmar
- Site Admin
- Posts: 7395
- Joined: Thu Sep 19, 2002 3:00 pm
- Location: Dorset, UK
- Contact:
Something like this:
Insert your regular expression in place of REGEX_PATTERN. The VBScript function will return a semicolon delimited list of matches. We can then loop through this list with Separate and Repeat/Until and do whatever you need to do to them.
Code: Select all
//A VBScript Function to search a string for a regex pattern
//returns a list of matches separated by semicolons
VBSTART
Function regExSearch(patrn,str)
Set regEx = New RegExp ' Create regular expression.
regEx.Pattern = patrn ' Set pattern.
regEx.IgnoreCase = True ' Make case insensitive. Default=False
Set matches = RegEx.Execute(str)
List = ""
For each match in matches
List = List & match.value & ";"
Next
regExSearch = Mid(List,1,Len(List)-1)
End Function
VBEND
//Read the file contents into a variable
ReadFile>YourFile.txt,FileData
//replace CRLF chars with VBScript equivalents
StringReplace>FileData,CR," & vbCR & ",FileData
StringReplace>FileData,LF," & vbLF & ",FileData
//Double quote any quotes for VBScript
StringReplace>FileData,","",FileData
//Perform the regex search
VBEval>regExSearch("REGEX_PATTERN","%FileData%"),URLList
//We now have a semicolon delimited list of URLs. We could explode this into an array:
Separate>URLList,;,URLS
If>URLS_COUNT>0
Let>k=1
Repeat>k
Let>ThisURL=URLS_%k%
MessageModal>ThisURL
//we could write it to a file:
WriteLn>outputfile,result,ThisURL
Let>k=k+1
Until>k=URLS_COUNT
Endif
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar
Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?
I created a empty script file in Macro Scheduler and pasted in your code and got the following error when I clicked run:
"Microsoft VBScript runtime error :5
Invalid procedure call or argument: 'Mid'
line 13, Column 2 "
"Microsoft VBScript runtime error :5
Invalid procedure call or argument: 'Mid'
line 13, Column 2 "
mtettmar wrote:Something like this:
Insert your regular expression in place of REGEX_PATTERN. The VBScript function will return a semicolon delimited list of matches. We can then loop through this list with Separate and Repeat/Until and do whatever you need to do to them.Code: Select all
//A VBScript Function to search a string for a regex pattern //returns a list of matches separated by semicolons VBSTART Function regExSearch(patrn,str) Set regEx = New RegExp ' Create regular expression. regEx.Pattern = patrn ' Set pattern. regEx.IgnoreCase = True ' Make case insensitive. Default=False Set matches = RegEx.Execute(str) List = "" For each match in matches List = List & match.value & ";" Next regExSearch = Mid(List,1,Len(List)-1) End Function VBEND //Read the file contents into a variable ReadFile>YourFile.txt,FileData //replace CRLF chars with VBScript equivalents StringReplace>FileData,CR," & vbCR & ",FileData StringReplace>FileData,LF," & vbLF & ",FileData //Double quote any quotes for VBScript StringReplace>FileData,","",FileData //Perform the regex search VBEval>regExSearch("REGEX_PATTERN","%FileData%"),URLList //We now have a semicolon delimited list of URLs. We could explode this into an array: Separate>URLList,;,URLS If>URLS_COUNT>0 Let>k=1 Repeat>k Let>ThisURL=URLS_%k% MessageModal>ThisURL //we could write it to a file: WriteLn>outputfile,result,ThisURL Let>k=k+1 Until>k=URLS_COUNT Endif
I've specified the path to my file d:\mytextfile.xml and it still gives the error. Does it matter that it's an xml file versus text file?
JRL wrote:That's exactly the error I get if I don't supply a data file. Try replacing the "yourfile" in the line: ReadFile>YourFile.txt,FileData with the path and file name that your data resides within. Something like:
ReadFile>c:\URLS.txt,FileData
Does that help?
replace line 37
replace line 37 - WriteLn>outputfile,result,ThisURL
with something like
WriteLn>c:\URLS.txt,result,FileData
Hope this helps
with something like
WriteLn>c:\URLS.txt,result,FileData
Hope this helps
Aaron
Re: replace line 37
I still get the following error even after making the suggested change below. And I was carefule to replace c:\urls.txt with my actual path to my file
"Microsoft VBScript runtime error :5
Invalid procedure call or argument: 'Mid'
line 13, Column 2 "
"Microsoft VBScript runtime error :5
Invalid procedure call or argument: 'Mid'
line 13, Column 2 "
Aaron wrote:replace line 37 - WriteLn>outputfile,result,ThisURL
with something like
WriteLn>c:\URLS.txt,result,FileData
Hope this helps
suggestion
why dont you post your xml file along with the script.
I will be happy to give it a try on my end.
I will be happy to give it a try on my end.
Aaron
Re: suggestion
OK. I had screwed up specifying the source file to read from... my bad...
now it works...
HOWEVER, when we get to the separate part of the code below it seems to be dropping one of the four URLs created in the VBEval "regex" step (full script above).
I know there are 4 Urls cause I did a message> on the "URLlist" Variable just before the Separate function and it showed 4 Urls.
But again, for some reason after it executes the code below it's only writing 3 of the 4 urls to D:\urls.txt
HMMMMMMMmm...... ANY IDEAS????
[/code]
now it works...
HOWEVER, when we get to the separate part of the code below it seems to be dropping one of the four URLs created in the VBEval "regex" step (full script above).
I know there are 4 Urls cause I did a message> on the "URLlist" Variable just before the Separate function and it showed 4 Urls.
But again, for some reason after it executes the code below it's only writing 3 of the 4 urls to D:\urls.txt
HMMMMMMMmm...... ANY IDEAS????
Code: Select all
//We now have a semicolon delimited list of URLs. We could explode this into an array:
Separate>URLList,;,URLS
If>URLS_COUNT>0
Let>k=1
Repeat>k
Let>ThisURL=URLS_%k%
//write it to a file:
WriteLn>D:\urls.txt,result,ThisURL
Let>k=k+1
Until>k=URLS_COUNT
Endif
I have an idea. Start with let>k=0 and move the let>k=k+1 line to the start of the repeat loop. As you have it, the 3rd time through the loop you set k=4 and so the loop stops before it has a chance to write the fourth URL.
Code: Select all
//We now have a semicolon delimited list of URLs. We could explode this into an array:
Separate>URLList,;,URLS
If>URLS_COUNT>0
Let>k=0
Repeat>k
Let>k=k+1
Let>ThisURL=URLS_%k%
//write it to a file:
WriteLn>D:\urls.txt,result,ThisURL
Until>k=URLS_COUNT
Endif
This is worth a try... i actually need it to list as many as 5 urlsl. There will never be less than 1 url and never more than 5.
Using your suggestion below however I'm not sure I understand because the Let>k=k+1 is already at the fine line underneath the repeat.
Using your suggestion below however I'm not sure I understand because the Let>k=k+1 is already at the fine line underneath the repeat.
JRL wrote:I have an idea. Start with let>k=0 and move the let>k=k+1 line to the start of the repeat loop. As you have it, the 3rd time through the loop you set k=4 and so the loop stops before it has a chance to write the fourth URL.
Code: Select all
//We now have a semicolon delimited list of URLs. We could explode this into an array: Separate>URLList,;,URLS If>URLS_COUNT>0 Let>k=0 Repeat>k Let>k=k+1 Let>ThisURL=URLS_%k% //write it to a file: WriteLn>D:\urls.txt,result,ThisURL Until>k=URLS_COUNT Endif
Yes, but as you had it Let>k=k+1 was at the end of the repeat loop. k would therefore become equal to "URLS_COUNT" one loop too early. As soon as the Until> statement is read and k is equal to URLS_COUNT, the loop ends....because the Let>k=k+1 is already at the fine line underneath the repeat.
Step through your code in the editor with Let>k=k+1 at the end of the repeat loop and watch what happens. The loop will run through three times (rather than four) and then quit.
On the other hand if Let>k=k+1 is at the start of the loop, k will become equal to URLS_COUNT and then the next lines will execute before the Until> is read and the loop ends. Thus the WriteLn> function will occur the required four times.
Hope this is more clear.
Thanks JRL... I figured out what you meant shortly after posting ... however, what if i want it to loop as much as 5 times. As I stated above... the file will always have at least 1 URL and up to 5 URLs - BUT NEVER MORE.
Am I still Okay with your repeat settings?? If not how do I mod to include a possible 5th URL?
THANKS!
Am I still Okay with your repeat settings?? If not how do I mod to include a possible 5th URL?
THANKS!
JRL wrote:...
On the other hand if Let>k=k+1 is at the start of the loop, k will become equal to URLS_COUNT and then the next lines will execute before the Until> is read and the loop ends. Thus the WriteLn> function will occur the required four times.
Hope this is more clear.