Getting tangled in loops

Hints, tips and tricks for newbies

Moderators: Dorian (MJT support), JRL

Post Reply
Rick0825
Newbie
Posts: 16
Joined: Sun Mar 05, 2023 5:30 pm

Getting tangled in loops

Post by Rick0825 » Mon May 08, 2023 5:54 pm

Many hours later.....Im trying to copy text from an 18 page PDF. The same text is spread throughout the PDF that im going after. My thought is to count image/text and then use OCR Wizard to copy and paste to Excel doc. An example text would be WSHP-1 on page 1 and WSHP-2 might be on page 3. Im using image find on the WSHP part and loop through the PDF. Then use OCR Wizard on the WSHP-1 part to copy and paste to an Excel Doc. Clear as mud right. Is my approach even correct? I know the code is wrong because the code is only finding 1 image on the first page.

Code: Select all

Let>NumFound=0
repeat>NumFound
 Wait>1.0
 //Find and Do Nothing Center of 
FindImagePos>%BMP_DIR%\image_3.bmp,SCREEN,0.7,1,XArr,YArr,NumFound,CCOEFF
 Press Page Down
Until>NumFound=5

User avatar
Grovkillen
Automation Wizard
Posts: 1009
Joined: Fri Aug 10, 2012 2:38 pm
Location: Bräcke, Sweden
Contact:

Re: Getting tangled in loops

Post by Grovkillen » Mon May 08, 2023 6:34 pm

You're not doing a loop which is counting. NumFound will most likely never be 5. Right?
Let>ME=%Script%

Running: 15.0.24
version history

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Getting tangled in loops

Post by Dorian (MJT support) » Mon May 08, 2023 6:41 pm

I think your loop is okay, but this is why you're only finding one image -

FindImagePos : Each method has benefits and drawbacks. CCOEFF is more intelligent and more tolerant but is slower and will return only one match. EXACT is faster and can return multiple matches but is precise and therefore less portable and will not cope with changes

That said, you might want to investigate some of the PDFtoText command line utilities out there. They can extract text and images, and you can use Macro Scheduler to run them. They might be another option (but of course will have a learning curve). Link to get you started XPDF Download

You'd use RunProgram. It would look something like this (I've cannibalized these lines from an old script of mine)

Code: Select all

Let>FilePath=c:\Users\xb360\Documents\Macrofiles\xpdfbin-win-4.04\bin64
RunProgram>%FilePath%\pdftotext -table -layout "d:\my.pdf"
Yes, we have a Custom Scripting Service. Message me or go here

Rick0825
Newbie
Posts: 16
Joined: Sun Mar 05, 2023 5:30 pm

Re: Getting tangled in loops

Post by Rick0825 » Mon May 08, 2023 6:44 pm

You are correct. I just put a 5 in there just to try and get different results. Ive changed the code so many times ive lost count. Would it be simpler to just use OCR Wizard to get the text as a string to move over to excel? But then how does it loop through and get all the rest. Example:
Page 1 - WSHP-1
Page 3 - WSHP-2
Page 7 - WSHP-3

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Getting tangled in loops

Post by Dorian (MJT support) » Mon May 08, 2023 7:15 pm

Who is this in reply to?

I think your FindimagePos loop is fine - if you're looking for 5 images at the same time (rather than <5 at a time, but 5 in total) - if you change to Exact. How about the PDFtoText suggestion? I have a feeling you saw Grovkillen's reply and missed mine (?).
Yes, we have a Custom Scripting Service. Message me or go here

Rick0825
Newbie
Posts: 16
Joined: Sun Mar 05, 2023 5:30 pm

Re: Getting tangled in loops

Post by Rick0825 » Mon May 08, 2023 7:20 pm

Is it possible then to use OCR Wizard in a loop to return multiple Strings? Im just curious about all the possibilities of MS. Im trying to learn more about moving data from 1 place to another. Am I getting close with this one? Its currently returning the text area of the last page only.

Code: Select all

Let>k=0
Label>start
  Let>k=k+1
  Wait>1.0
  OCRArea>887,179,943,199,strText
  Wait>1.0
Press Page down
  If>k<10
    Goto>start
  Endif

MessageModal>strText

Rick0825
Newbie
Posts: 16
Joined: Sun Mar 05, 2023 5:30 pm

Re: Getting tangled in loops

Post by Rick0825 » Mon May 08, 2023 7:40 pm

Thank you Dorian I have not tried the software your suggesting, but ive tried others with not so good results. I'm just wanting to focus on MS code. Im loving it. The struggle is finding the right resources for direct questions that I have. Im used to Googling everything, but thats not been much help for MS code. Its been fun nevertheless.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Getting tangled in loops

Post by Dorian (MJT support) » Tue May 09, 2023 9:35 am

Rick0825 wrote:
Mon May 08, 2023 7:20 pm
Its currently returning the text area of the last page only.

Code: Select all

Let>k=0
Label>start
  Let>k=k+1
  Wait>1.0
  OCRArea>887,179,943,199,strText
  Wait>1.0
Press Page down
  If>k<10
    Goto>start
  Endif

MessageModal>strText
Each time your loop runs, the last value of strtext will be replaced with the new value.

You'd see a difference if you put your MessageModal inside your loop. That's the simplest solution. The two-loop solution below is a lot more convoluted but may be more desirable.

I think learning about the debugger and reading about loops might help you.

Another option is for each cycle of the loop to add whatever it finds to an array, then to have a second loop which does something with that array of data.

So :

Code: Select all

OCRArea>887,179,943,199,strText
Would become :

Code: Select all

OCRArea>887,179,943,199,strText_%k%
A simulated demonstration of the two-loop version. One loop to extract data (or in my example create pseudo data), and a second loop to do something with it :

Code: Select all

//Simulate "extracting data" in a loop, creating an array as it goes. 10 cycles
Let>inloop=0
repeat>inloop
  Let>inloop=inloop+1

  //Creates an array of pseudo data, 100,200,300, and so on.  
  Let>strText_%inloop%={(%inloop%*100)}
Until>inloop,10

//How many items are in the array?
ArrayCount>strText,items
mdl>This array contains %items% items.  We will loop that many times.

//"Do something with the data", looping %items% number of times 
Let>outloop=0
repeat>outloop
  let>outloop=outloop+1
  mdl>strText_%outloop%
Until>outloop,items
Yes, we have a Custom Scripting Service. Message me or go here

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Getting tangled in loops

Post by Dorian (MJT support) » Tue May 09, 2023 9:56 am

Rick0825 wrote:
Mon May 08, 2023 7:40 pm
The struggle is finding the right resources for direct questions that I have.
Here's a list of resources, in no particular order.

1. Search the forum. Almost everything has been discussed here over the years - although I don't think this one specifically has. (I like to "display results as topics").

2. The KnowledgeBase. Type a work and if there's an article it will show in the list (suggestion to newcomers - start with "wizard", then "loop", then "debug")

3. Command Reference. An alphabetical list of every Macro Scheduler command, each linked to the help file. At the bottom you'll also see a handy list of system variables.

4. The manual. Complete with search feature, and most importantly an in-depth explanation of exactly how each command works.

5. The command reference. Every Macro Scheduler command grouped into categories. A very handy starting point for newcomers.

6. Context sensitive help. In the editor, type a command and press F1. This will bring up context sensitive help, which usually has usage examples.

7. Email us, over at support (I know you know how to find us) :D

I totally appreciate the learning curve with Macro Scheduler. While the usage examples are excellent at learning single principles, it's quite another thing learning how to combine them.
Yes, we have a Custom Scripting Service. Message me or go here

Rick0825
Newbie
Posts: 16
Joined: Sun Mar 05, 2023 5:30 pm

Re: Getting tangled in loops

Post by Rick0825 » Tue May 09, 2023 4:47 pm

Thanks so much for the help Dorian. I have become very familiar with the links you provided over the last few months. Ive learned so much. Now I feel like I just need to practice writing code. I also understand that I paid for the software and not a teacher, so if im stepping over that line then please let me know.

using your sample above I wrote this which is giving me very close to the text that im after.

Code: Select all

Let>k=0
Repeat>k
  Let>k=k+1
  Wait>1.0
  OCRArea>887,179,943,199,strText_%k%
  Wait>1.0
  Press Page down
  Let>strText_%k%={(%k%*1)}
Until>k,7

MDL>strText_%k%
The MDL only shows the last text im copying. I read in another post that said you cant paste an Array. And that you cant store Strings. So it makes sense that its giving me the last known OCR text copy. The interesting part is it shows all my copied text in the Watch List but does show up on the MDL. I think it has something to do with the line: Let>strText_%k%={(%k%*1)} but I dont understand that line of code.

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Getting tangled in loops

Post by Dorian (MJT support) » Tue May 09, 2023 5:19 pm

You're almost there.

You either want this (untested) :

Code: Select all

Let>k=0
Repeat>k
  Let>k=k+1
  Wait>1.0
  OCRArea>887,179,943,199,strText
  Wait>1.0
  Press Page down
  //Let>strText_%k%={(%k%*1)}
  MDL>strText
Until>k,7
.. or this (also untested) :

Code: Select all

Let>k=0
Repeat>k
  Let>k=k+1
  Wait>1.0
  OCRArea>887,179,943,199,strText_%k%
  Wait>1.0
  Press Page down
  //Let>strText_%k%={(%k%*1)}
Until>k,7

//How many items are in the array?
ArrayCount>strText,items
mdl>This array contains %items% items.  We will loop that many times.
//"Do something with the data", looping %items% number of times 
Let>outloop=0
repeat>outloop
  let>outloop=outloop+1
  mdl>strText_%outloop%
Until>outloop,items
My Let>strText_%k%={(%k%*1)} was just there to create a sample data array. We don't need that as you're doing that with OCRArea>887,179,943,199,strText_%k% in the second example above, and not using an array at all in the first example above.

I should point out that OCRArea>887,179,943,199,strText_%k% relies on the text you want being in exactly the right place every single time you press Page Down. That in itself may be quite some feat (and is why I would be using the PDF extraction methods previously mentioned - although I accept they are not suitable for every scenario).

You're not overstepping any boundaries at all, and I appreciate you being considerate. I like to go the extra mile for those who are trying to learn and understand. Where I struggle is with the very occasional person who doesn't want to learn, wants us to do it all for them, and can't be bothered to visit or read any of the links I point them to - well, y'know. Luckily I encounter those people very rarely. Besides, the forum was a major part of how I learned all this over 20 years ago.
Yes, we have a Custom Scripting Service. Message me or go here

User avatar
Dorian (MJT support)
Automation Wizard
Posts: 1348
Joined: Sun Nov 03, 2002 3:19 am
Contact:

Re: Getting tangled in loops

Post by Dorian (MJT support) » Tue May 09, 2023 5:23 pm

You probably already know this, but it may help newcomers...

Have you ever noticed that when you start typing a command, Macro Scheduler auto-suggests commands for you on your third letter? And then you get the little syntax helper at the bottom of the editor? It's a godsend for knowing what parameters Macro Scheduler is expecting.
Yes, we have a Custom Scripting Service. Message me or go here

Post Reply
cron
Sign up to our newsletter for free automation tips, tricks & discounts