Array From A ReadLn

General Macro Scheduler discussion

Moderators: Dorian (MJT support), JRL

User avatar
jpuziano
Automation Wizard
Posts: 1085
Joined: Sat Oct 30, 2004 12:00 am

Post by jpuziano » Fri Mar 13, 2009 12:36 am

mtettmar wrote:Here's the RegEx solution:

So full code to create an array from a line of CSV and loop through each field:

Code: Select all

//Example CSV line
Let>text=sally,"1,2500",fred

//RegEx to extract each field to an array
Let>pattern="[^"\r\n]*"|[^,\r\n]*
RegEx>pattern,text,0,fields,num,0

//Loop through each field
Let>k=0
Repeat>k
  Let>k=k+1
  Let>the_field=fields_%k%

  //RegEx to remove starting and ending quotes if present
  RegEx>^"|"$,the_field,0,matches,n,1,,the_field

  //Display the field
  MessageModal>the_field
Until>k=num
Hi Marcus,

Thanks again for the above RegEx solution... its like Separate> on steroids.

I translated your RegEx into an EasyPattern and it works just as well. I even added some more challenging data like empty values ,,, (nothing between successive commas) and an empty double quoted value ,"", and both your original and the one below performed as expected:

Code: Select all

//Example CSV line
Let>text=sally,"1,2500",fred,"111,222,333",,"",red,green,done

//EasyPatterns translation
Let>pattern=[('"' (zeroOrMore (not doubleQuote or return or linefeed)) '"') or (zeroOrMore (not comma or return or linefeed))]
RegEx>pattern,text,1,fields,num,0

//Loop through each field
Let>k=0
Repeat>k
  Let>k=k+1
  Let>the_field=fields_%k%

  //RegEx to remove starting and ending quotes if present
  RegEx>^"|"$,the_field,0,matches,n,1,,the_field

  //Display the field
  MessageModal>the_field
Until>k=num
Then, in an attempt to make one RegEx call do it all, I replaced the literal '"' values (singlequote doublequote singlequote) with EasyPatterns mustBeginWith(doubleQuote) and mustEndWith(doubleQuote) keywords... see modified code below:

Code: Select all

//Example CSV line
Let>text=sally,"1,2500",fred,"111,222,333",,"",red,green,done

//EasyPatterns - trying to use mustBeginWith and mustEndWith to discard the double quotes
Let>pattern=[(mustBeginWith(doubleQuote) (zeroOrMore (not doubleQuote or return or linefeed)) mustEndWith(doubleQuote)) or (zeroOrMore (not comma or return or linefeed))]
RegEx>pattern,text,1,fields,num,0

//Loop through each field
Let>k=0
Repeat>k
  Let>k=k+1
  Let>the_field=fields_%k%

  //RegEx to remove starting and ending quotes if present
  //RegEx>^"|"$,the_field,0,matches,n,1,,the_field

  //Display the field
  MessageModal>the_field
Until>k=num
The EasyPatterns reference for those keywords says that "When a match is found, it must be/must not be preceded by what is in the brackets. The bracket contents are NOT included in the actual match."

That way, I hoped to discard the leading and trailing double quotes... without having to use a second RegEx to do it later in the loop.

Well it doesn't work. If you run the example above, you'll see all kinds of strange matches going on. I am wondering though... is it because there's a problem with how I've written the EasyPattern... or is discarding the double quotes in this way just not possible?
Last edited by jpuziano on Fri Mar 13, 2009 3:58 am, edited 1 time in total.
jpuziano

Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post - :-)

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Fri Mar 13, 2009 3:49 am

I needed to put in my RegEx solution, sorry for the delay:

Code: Select all

Let>vHaystack=red,white,blue,"555,000,000",green,"23,000",yellow
Let>vNeedle1="([^,]*),*([^,]*),*([^,]*),*([^,]*)"
Let>vNewNeedle1=$1~$2~$3~$4
Let>vNeedle2=~*,
Let>vNewNeedle2=,
Let>vNeedle3=~*\n
Let>vNewNeedle3=\n
Let>vComma=,

// Remove double quotes, and replace commas with temp tildes
RegEx>%vNeedle1%,%vHaystack%,0,vMatchString,vMatchNumber,1,%vNewNeedle1%,vResult
// Remove extra trailing tildes in center of string
RegEx>%vNeedle2%,%vResult%,0,vMatchString,vMatchNumber,1,%vNewNeedle2%,vResult
// Remove extra trailing tildes at end of string
RegEx>%vNeedle3%,%vResult%,0,vMatchString,vMatchNumber,1,%vNewNeedle3%,vResult

// Next line will not work with %vComma% - why not?
Separate>%vResult%,vComma,vField
If>%vField_count%=0,End

// Replace temp tildes with commas
Let>vCount=0
Repeat>vCount
    Let>vCount=vCount+1
    // Could insert "IF" here and only process if tilde is in the field
    StringReplace>vField_%vCount%,~,%vComma%,vTemp
    Let>vField_%vCount%=vTemp
Until>vCount=%vField_count%
I made a simple source string, vHaystack, with numbers of varying length. This solution should handle a number with up to three commas inside it.

But, I have run across two problems, though. The solution works, but I had to play tricks with the code to make it happen.

Problem 1: Cannot use %vComma% in the Separate command. But it accepted "vComma" with no problem. I think the variable with % should be good here, don't understand why not. %vComma% did work in StringReplace.

Problem 2: For some reason I was unable to use "vField_%vCount% in the result of StringReplace command. The value of %vCount% was not converting to the value of the variable. So, I used vTemp to make the command work. I tried with and without the %, made no difference. And I just used a simple "k", with and without %, and still made no difference. I closed Macro Scheduler and restarted it to clear any transient values, still would not work.
Using Macro Scheduler 11.1.05 with XP PRO SP2.

Don't mean to hijack this thread, so maybe these problems should be moved into a new topic?
Last edited by Bob Hansen on Fri Mar 13, 2009 2:21 pm, edited 2 times in total.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 13, 2009 7:21 am

Bob, what you describe is correct behaviour.

You need to use Let to resolve a complex embedded variable:

Let>this_field=vField_%k%

You will note all my examples use this. Otherwise you get only vField_1 etc.

John: nice work with the Easy Patterns. Afraid I can't answer your final question. Probably just misinterpreted what Easy Patterns said it will do. I'll ask Simon at DataMystic.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 13, 2009 10:02 am

John,

I had a look at your attempt with Easy Patterns, and studied the Easy Patterns reference (I'm learning here too!) and realised your problem was partly one of boolean logic. Here's my solution:

Code: Select all

Let>text="a,b",Fred,"12,500",blue

Let>pattern=[(mustBeginWith(doubleQuote) zeroOrMore(not doubleQuote or return or linefeed) (not comma) mustEndWith(doubleQuote)) or ( ( mustBeginWith(comma) or mustBeginWith(lineStart)) zeroOrMore(not comma or doubleQuote or return or linefeed) (mustEndWith(comma) or mustEndWith(lineEnd)) )]

RegEx>pattern,text,1,fields,num,0
So just one line of code now splits the text perfectly to:

a,b
Fred
12,500
blue

There have to be some assumptions however, and I don't think it's possible to make one line cover every possible situation. The above won't work if you have a field in double quotes that ends with a comma, e.g. ",,,". I think the safer solution is my original two parse version. We have to tell it to ignore ", ... ," otherwise in the above sample text it will see ",Fred," as a field. It also won't be able to cope with non quoted fields containing a quote where, again, the two parse solution will.

So the safer solution is my original two parse version. For 100% reliability I don't see any getting away from it.

However, this is a nice exercise for learning about Easy Patterns!

For the brave among you here's the Perl RegEx equivalent:

Code: Select all

Let>pattern=(?:(?:(?:(?:(?<=(?:\"))(?:(?:(?:[^\"\r\n])))*(?:(?:[^\,]))(?=(?:\")))))|(?:(?:(?:(?:(?:(?:(?<=(?:\,)))|(?:(?<=(?:^)))))(?:(?:(?:[^\,\"\r\n])))*(?:(?:(?:(?=(?:\,)))|(?:(?=(?:$)))))))))
Ouch. Gotta love Easy Patterns! Kudos to DataMystic.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Fri Mar 13, 2009 2:02 pm

Marcus, excellent job on your RegEx solution.
And John and jpuziano have also provided some great RegEx work. Thanks to all.

And Marcus, thanks for clearing up the vVariable_%vCount% issue for me. I must have forgotten that, but it does not sound the least bit familiar. I am glad I was able to stumble into the correct usage. That takes care of problem 2.

But I still have problem 1. What about the "%vComma%" not working in Separate, but the "vComma" was OK?

Code: Select all

Let>vComma=,

// Next line will work with vComma
Separate>%vResult%,vComma,vField

// Next line will not work with %vComma% - why not?
Separate>%vResult%,%vComma%,vField
// expected "vField_Count" becomes ",vField_Count"
And on Marcus' single RegEx solution, with original perl syntax, I am up to character 18, figure it will take me another weeks to fully make sense. Thanks for the great example, I guess I will start to look at EasyPatterns vs. traditional.....
Last edited by Bob Hansen on Fri Mar 13, 2009 2:38 pm, edited 1 time in total.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 13, 2009 2:21 pm

Separate>%vResult%,%vComma%,vField

is the same as saying:

Separate>%vResult%,,,vField

And it will then get all parsed wrong.

Furthermore, I advocate NOT using % symbols when specifying the variable on it's own.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Fri Mar 13, 2009 2:49 pm

Separate>%vResult%,%vComma%,vField

is the same as saying:

Separate>%vResult%,,,vField

And it will then get all parsed wrong.
I thought of that. But using the same logic:
StringReplace>vField_%vCount%,~,%vComma%,vTemp

is the same as saying:

StringReplace>vField_%vCount%,~,,,vTemp

But it did not get all parsed wrong.
Isn't that the purpose of usiing the vComma variable, to prevent the string from being misinterpreted? This seems to be inconsistent. And I usually use the % around my variables so I can see the actual values in the log files. This says sometimes I can use the % and sometimes I cannot. Neither of these cases is assigning a value to vComma, they are both calling its value.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 13, 2009 3:57 pm

Ok. I'll check the code out to see if I can work out why there's a difference and aim to resolve the inconsistency - one way or the other.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 13, 2009 4:19 pm

Ok. Found the problem. It's a bug in Separate. You're quite right %vComma% should work. I've just got used to not using % symbols for variables on their own. Now fixed in dev. Let me know if you notice other commands with the same problem.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
jpuziano
Automation Wizard
Posts: 1085
Joined: Sat Oct 30, 2004 12:00 am

Post by jpuziano » Fri Mar 13, 2009 5:21 pm

mtettmar wrote:Ok. Found the problem. It's a bug in Separate. You're quite right %vComma% should work. I've just got used to not using % symbols for variables on their own. Now fixed in dev. Let me know if you notice other commands with the same problem.
Thanks for posting about this Bob... and thanks Marcus... looking forward to the fix in the next maintenance release.

Bob, this other thing you said got me thinking...
Bob Hansen wrote:And I usually use the % around my variables so I can see the actual values in the log files.
I always try to "get away with" not surrounding my variables with % and only add them if absolutely necessary to make things work... so I agree with Marcus when he said...
mtettmar wrote:Furthermore, I advocate NOT using % symbols when specifying the variable on it's own.
...my reasoning being, that's the way the commands were originally designed... so if things work as originally designed... go with that. The % thing was added later... so if I don't need it to make things work, I don't use it. I suppose that would make my code more compatible with older versions of MS before % was added.

However, at one point, it was discovered (posted on the forums) that if you surrounded your variables with %... the log file would record the values of the variables... instead of just the variable names.

%var% = var values logged in logfile
var = var names only logged in logfile

It seems to me that the concept of %variables% was not created to control what gets logged in the log files... however you have changed your coding practices to take advantage of the logging benefit (side-effect?).

I also noticed in this post: http://www.mjtnet.com/forum/viewtopic.php?t=2538
Marcus talks about the new (at that time) _DUMP_VARS system variable.
mtettmar wrote:Much easier to use TimeStamp for this. Also in v8.0 you can dump out all vars between each step just by setting _DUMP_VARS to 1 :-)
Useful... but no doubt it would produce HUGE logfiles huh Bob? I can see why you'd prefer to just always use %variables% to see the values.

IDEA: :idea:

To allow us to "unhook" how variables are logged... from how we code them... how about creating a new SYSTEM variable?
proposed new system variable wrote:_VAR_VALUE_LOGGING=0
this would be default behavior, what happens today

_VAR_VALUE_LOGGING=1
log var name and value regardless of whether var was coded as var or %var%
This would mean:
  • we could choose the logging style we want at the top of the script... i.e. just "set it and forget it"
  • since logging style would no longer be dependant on how we coded the variable names... we could go back to using just var as intended... and only use %var% when it is actually needed for correct operation of the code
  • logging aficionados could even dynamically change the value of the new (proposed) system variable dynamically throughout a script if they wanted... though I'd think that if they were trouble-shooting... setting _DUMP_VARS to 1 would be the ultimate way to smoke out the problem... but again, huge logfiles no doubt.
Bob, Marcus, your thoughts? Would anyone out there like to be able to set the variable logging style in this way?

Thanks for listening and take care
jpuziano

Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post - :-)

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 13, 2009 5:51 pm

Erm ... use of % symbols is perfectly acceptable and if you are using VAR_EXPLICIT it is also necessary. It's personal preference. Bob should continue coding the way he prefers.

I don't buy your logic about not using %var% - they have been supported since at least March 1999 - possibly earlier!!

As for DUMP_VARS - you CAN switch it OFF as well as ON you know! So log files don't have to be "huge".

I think we're in danger of losing the real value of this post now (reading CSV data) so perhaps we should close the thread and move other topics elsewhere.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Fri Mar 13, 2009 7:24 pm

I agree with Marcus. I think existing variable tools are OK.

Glad you identified the bug in Separate.

Sorry I hijacked the thread. Subject closed for me. Thanks to all.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

DataMystic
Newbie
Posts: 2
Joined: Mon Mar 16, 2009 5:23 am

Post by DataMystic » Mon Mar 16, 2009 5:30 am

With EasyPatterns, the best way of handling CSV fields is using

Code: Select all

[ lineStart, CSVField, zeroOrMore ( comma, CSVField ), lineEnd ]
EP takes care of fields with double quotes, single quotes or no quotes for you. The regex is horrendous! We don't try to discard the quotes.

If you need to split out each field and capture it separately, use:

Code: Select all

[ lineStart, 
  capture(CSVField), comma,
  capture(CSVField), comma,
  capture(CSVField), comma,
  capture(CSVField), comma,
  capture(CSVField),
; etc etc (this line is an EasyPattern comment)
lineEnd ]
Regards,

Simon Carter, DataMystic Melbourne, Australia
http://www.DataMystic.com +61.3.9913.0595 (GMT+10 hours)
TextPipe: Text Conversion, Reformatting and Extraction Workbench
PCMAG - "The ultimate text conversion and manipulation tool"

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Mon Mar 16, 2009 9:58 am

Thanks Simon,

I can't believe I missed the CSVField keyword :oops:
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
jpuziano
Automation Wizard
Posts: 1085
Joined: Sat Oct 30, 2004 12:00 am

Post by jpuziano » Mon Mar 16, 2009 10:38 am

DataMystic wrote:If you need to split out each field and capture it separately, use:

Code: Select all

[ lineStart, 
  capture(CSVField), comma,
  capture(CSVField), comma,
  capture(CSVField), comma,
  capture(CSVField), comma,
  capture(CSVField),
; etc etc (this line is an EasyPattern comment)
lineEnd ]
Thanks for that Simon. I was hoping you'd jump in.

Yes, I believe the goal is to parse the elements of the CSV record/line into a macroscript variable array. I tried the code below to separately capture the nine elements of the CSV line:

Code: Select all

//Example CSV line
Let>text=sally,"1,2500",fred,"111,222,333",,"",red,green,done

//EasyPattern from Simon at DataMystic
Let>pattern=[ lineStart, capture(CSVField) as 'my_varname', comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), lineEnd ]
RegEx>pattern,text,1,fields,num,0
The EasyPatterns reference at:
http://www.datamystic.com/easypatterns_reference.html
says:
EasyPatterns wrote:[capture(...)],

[capture(...) as 'varname' ]

Assigns the contents of the group to a variable which can be referred to later in both the search pattern ([group#] e.g. [group6] ,# can range from 1 to 26) and in the replacement string ($# e.g. $6, # can range from 1-9, a-z. $0 represents the entire matched string). If specified, the text can also be stored in the global variable 'varname' in addition to the positional variables $1, $2 etc.
Marcus, I single stepped through the above macro in the debugger... and after the RegEx> command ran, there were no MS variables created for the capture variables... i.e. no $1 through $9 and no my_varname variable. Though these variables would be understood if used in the "Replace String"... they are not visible from MS itself so we can't use them directly to populate an array.

So what I did was create a replacement string like this:

$1|$2|$3|$4|$5|$6|$7|$8|$9

the result being, the CSV line was translated into a new line where the "element separating commas" were replaced by a different character, I chose vertical bar |.

Next I used Separate> with the vertical bar character | as the delimiter and your original code to trim the leading or trailing " characters if present:

Code: Select all

//Example CSV line
Let>text=sally,"1,2500",fred,"111,222,333",,"",red,green,done

//EasyPattern from Simon (at Datamystic) 
Let>pattern=[ lineStart, capture(CSVField) as 'varname', comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), comma, capture(CSVField), lineEnd ]
RegEx>pattern,text,1,fields,num,1,$1|$2|$3|$4|$5|$6|$7|$8|$9,new_text

Separate>new_text,|,fields

//Loop through each field
Let>element=0
Repeat>element
  Let>element=element+1
  Let>the_field=fields_%element%

  //RegEx to remove starting and ending quotes if present
  RegEx>^"|"$,the_field,0,matches,n,1,,the_field

  //Display the field
  MessageModal>the_field
Until>element=fields_count
The above would work fine and long as the vertical bar character | did not appear anywhere in the CVS data.

Two questions at this point:

1) Marcus, if it is possible, would you consider creating MS variables to expose the EasyPattern capture group variables $0 through $26 and the optional 'varname' variable... if they were defined in the EasyPattern?

And I suppose capture groups could also have been defined in a regular RegEx pattern as well.

This may be an impossible or unreasonable request so no worries... just throwing the idea out there to see what you think.

2) If the element values used single quotes ' instead of double quotes " to surround values with embedded commas... its clear we could just substitute ' for " in the second RegEx> command above... however, we'd have to know ahead of time that that's how the data was. Is there a set standard for this? i.e. are double quotes " always or almost always used?
Last edited by jpuziano on Wed Mar 18, 2009 2:12 am, edited 1 time in total.
jpuziano

Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post - :-)

Post Reply
cron
Sign up to our newsletter for free automation tips, tricks & discounts