How to separate this

Hints, tips and tricks for newbies

Moderators: JRL, Dorian (MJT support)

Post Reply
timle
Pro Scripter
Posts: 96
Joined: Tue Apr 20, 2004 5:53 am

How to separate this

Post by timle » Mon Sep 22, 2008 9:36 pm

This is the content of "dirsize.txt"
-rwxrwxrwx 1 owner group 563682 Sep 19 11:22 filename1.pdf
-rwxrwxrwx 1 owner group 729569 Sep 19 12:40 filename2.pdf

How do I use separate to get the "filename.pdf part to use for my script

thank you


ReadFile>%dir1%\dirsize.txt,data
Separate>data,Dates,filename

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Mon Sep 22, 2008 10:49 pm

Something like this should work for you:

Code: Select all

ReadFile>c:\temp\dirsize.txt,dirdata
//First separate the file into lines
Separate>dirdata,%CRLF%,dirfiles
If>dirfiles_count=0,end
//Loop through the lines
Let>k=0
Repeat>k
Let>k=k+1
//Then separate the line using a space for separator
Separate>dirfiles_%k%, ,filenames
//The file name is the 9th data element
MDL>filenames_9
Until>k,dirfiles_count
Label>end

timle
Pro Scripter
Posts: 96
Joined: Tue Apr 20, 2004 5:53 am

Inconsistent result

Post by timle » Fri Sep 26, 2008 5:27 pm

-rwxrwxrwx 1 owner group 12456361 Sep 19 11:22 filename1.pdf
-rwxrwxrwx 1 owner group 729569 Sep 19 12:40 filename2.pdf

The result is not always at the 9th data element, if there are inconsistent in spacing in the 5th element.
is there a way to delete all the extra space and replace with just one space before we separate them out,
(it did not show in my post, but the real file has more space between the word "group" and the numbers)

thanks

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Sep 26, 2008 6:06 pm

Filename is always the last item. Therefore:

Code: Select all

Let>str=-rwxrwxrwx 1 owner group   563682 Sep 19 11:22   filename1.pdf
Separate>str,SPACE,parts
Let>filename=parts_%parts_count%
MessageModal>filename
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Fri Sep 26, 2008 6:24 pm

Nice solution Marcus.

Here is another option:
Let>str=-rwxrwxrwx 1 owner group 563682 Sep 19 11:22 filename1.pdf
Position>filename1.pdf,%str%,1,StartPos
MidStr>%str%,%StartPos%,20,filename
MessageModal>%filename%

But your solution is much better, do not need to know the name of the filename.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

timle
Pro Scripter
Posts: 96
Joined: Tue Apr 20, 2004 5:53 am

Thankyou that works for filename

Post by timle » Fri Sep 26, 2008 9:33 pm

but if I also wants to get the "byte" after the "group"
how can I do that

thanks

Me_again
Automation Wizard
Posts: 1101
Joined: Fri Jan 07, 2005 5:55 pm
Location: Somewhere else on the planet

Post by Me_again » Sat Sep 27, 2008 1:20 am

If you know that the problem is double spaces just use StringReplace> to replace any double spaces with single spaces.

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Sat Sep 27, 2008 1:21 am

Use the code from Marcus, with a simple change:

Let>str=-rwxrwxrwx 1 owner group 563682 Sep 19 11:22 filename1.pdf
Separate>str,SPACE,parts
Let>groupID=%parts_5%
MessageModal>%groupID%

Check out the Help section for the Separate command to understand how the segments are numbered and counted as variables.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
jpuziano
Automation Wizard
Posts: 1085
Joined: Sat Oct 30, 2004 12:00 am

Post by jpuziano » Sat Sep 27, 2008 2:20 am

mtettmar wrote:Filename is always the last item. Therefore:

Code: Select all

Let>str=-rwxrwxrwx 1 owner group   563682 Sep 19 11:22   filename1.pdf
Separate>str,SPACE,parts
Let>filename=parts_%parts_count%
MessageModal>filename
There's a problem. If there are spaces within the filenames like this...
  • -rwxrwxrwx 1 owner group 563682 Sep 19 11:22 file name 1.pdf
    -rwxrwxrwx 1 owner group 729569 Sep 19 12:40 file name 2.pdf
...then the above code would return "1.pdf" and "2.pdf" instead of the full filename.

Here's one solution...

Since there will never be a colon character : in the filename, we can break up the line using : as the delimiter instead of SPACE. The right-most piece will always start with the colon from the time value... like this for example:
  • :22 file name 1.pdf
So now all we have to do is throw away the first few characters... and what's left is the filename. Here's the code:

Code: Select all

Let>str=-rwxrwxrwx 1 owner group   563682 Sep 19 11:22   file name 1.pdf
Separate>str,:,parts
Let>filename=parts_%parts_count%
MidStr>filename,6,1000,just_the_filename
Message>just_the_filename
Note I didn't bother to get the length of the string for use in the MidStr> command, I just plunked in 1000. This is assuming any filename ever encountered will be less than 1000 chars long which I'd say is a safe bet... and the command doesn't complain.

Now as for parsing the "bytes after the group", here's one solution...

Code: Select all

Let>str=-rwxrwxrwx 1 owner group   563682 Sep 19 11:22   file name 1.pdf
Separate>str,group   ,parts
Let>line_starting_with_bytes=parts_%parts_count%
Separate>line_starting_with_bytes,SPACE,parts
Let>filename=parts_1
MessageModal>filename
Note in the first Separate> command, there are three spaces after group... so your delimiter is "group ". That saves you from having to Trim the result string... i.e. it will start with the actual bytes value. It also assumes all your lines will have three spaces between the end of "group" and the beginning of the bytes value. As long as that is always true, this will alway work.

After that, you just separate again, using SPACE as the delimiter only this time, what you are after is the first piece... not the last. Try it out, see if it works for you and let us know.

Can anyone find a shorter solution to the above?

P.S. Thanks Marcus for the Blog post on MS version 11 - much appreciated and looking forward to the beta! :D
Last edited by jpuziano on Sat Sep 27, 2008 6:29 am, edited 1 time in total.
jpuziano

Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post - :-)

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Sat Sep 27, 2008 3:27 am

Just an idea, no code here yet.....

Two methods of dealing with the spaces:
1. Use VB RegEx to replace multiple spaces with a single space, will take care of any number of spaces between strings
OR
2. Use multiple lines of StringReplace, replacing double spaces with a single space. Maybe three lines in a row to take care of four or five spaces in a row.

Then use code from Marcus
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

User avatar
jpuziano
Automation Wizard
Posts: 1085
Joined: Sat Oct 30, 2004 12:00 am

Post by jpuziano » Sat Sep 27, 2008 3:49 am

Bob Hansen wrote:Just an idea, no code here yet.....

Two methods of dealing with the spaces:
1. Use VB RegEx to replace multiple spaces with a single space, will take care of any number of spaces between strings
OR
2. Use multiple lines of StringReplace, replacing double spaces with a single space. Maybe three lines in a row to take care of four or five spaces in a row.

Then use code from Marcus
Hi Bob,

What would happen if there were three spaces in a row within the filename... like this for example:

file^^^name.txt

...where each ^ represents a space

If you do either 1 or 2 above, you would be altering the actual filename and perhaps not able to parse out the real filename which is "file^^^name.txt" (contains 3 spaces) not "file^name.txt" (contains only 1 space).

The code I posted also works for filenames that contain multiple spaces in a row:

Code: Select all

Let>str=-rwxrwxrwx 1 owner group   563682 Sep 19 11:22   filenamecontains   3spaces.pdf
Separate>str,:,parts
Let>filename=parts_%parts_count%
MidStr>filename,6,1000,just_the_filename
Message>just_the_filename
jpuziano

Note: If anyone else on the planet would find the following useful...
[Open] PlayWav command that plays from embedded script data
...then please add your thoughts/support at the above post - :-)

User avatar
Marcus Tettmar
Site Admin
Posts: 7395
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Sat Sep 27, 2008 7:49 am

Regex to get the bytes:

Code: Select all

//DOES THE REGEX
VBSTART
Function regExSearch(strPattern,str)
  Set regEx = New RegExp ' Create regular expression.
  regEx.Pattern = strPattern ' Set pattern.
  regEx.IgnoreCase = True ' Make case insensitive. Default=False
  Set matches = RegEx.Execute(str)
  List = ""
  For each match in matches
  	 List = List & match.value & ";"
  Next
  if List <> "" then
    regExSearch = Mid(List,1,Len(List)-1)
  end if
End Function
VBEND

Let>line=-rwxrwxrwx 1 owner group 563682 Sep 19 11:22 filename1.pdf
VBEval>regExSearch("group[\x20]*([0-9]{1,})\b","%line%"),bytes
StringReplace>bytes,group,,bytes
StringReplace>bytes, ,,bytes
MDL>bytes
This of course is a *nix directory listing. How is it being produced, because if you have control over this you could specify the ls parms to return the list in a different format that may not require parsing. E.g. the "ls" command can be set to only return the filenames. If this is from an FTP listing, FTPGetDirList can also be set to return only filenames.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Bob Hansen
Automation Wizard
Posts: 2475
Joined: Tue Sep 24, 2002 3:47 am
Location: Salem, New Hampshire, US
Contact:

Post by Bob Hansen » Sat Sep 27, 2008 5:17 pm

For jpuziano
Hi Bob,

What would happen if there were three spaces in a row within the filename... like this for example:
You are right if the filename might be split. But I did not see any indication from timle, or his samples, that "filename" would sometimes have a space character in the middle. My assumption was that filename.ext would not be split. Thanks for pointing out what would happen with a bad assumption. But, if the assumption is good, I think the multiple lines would work OK.
Hope this was helpful..................good luck,
Bob
A humble man and PROUD of it!

timle
Pro Scripter
Posts: 96
Joined: Tue Apr 20, 2004 5:53 am

thank you for your help

Post by timle » Tue Sep 30, 2008 4:00 pm

I used tips from everyone to make a script to check the ftp to see if any files coming in and when it stop growing then download the file to local drive then remove it from the FTP
Thanks for all your help

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts