getting text location

adroege · Post by **adroege** » Wed May 19, 2010 8:24 pm

Check this out. This may also help you.

WebRecorder for Macro Scheduler

http://www.mjtnet.com/webrecorder.htm

cyberiguana · Post by **cyberiguana** » Thu May 20, 2010 4:39 am

this sounds like a solution that could make my life a lot easier. However I have no idea how this website handles posts and gets. I looked at the page source and it has a lot of java script functions. How do I know what to send and what to get? Where to start?

gdyvig · Post by **gdyvig** » Thu May 20, 2010 2:21 pm

Hi cyberiguana,

The HTTPRequest method suggested by adroege may end up being more reliable, but I want to address a couple of questions on the image recognition approach.

I believe image recognition is the way here, but I can't find a way for the macro to "remember" which images he already scanned.

Try using ScreenCapture to capture the line at the bottom of the scrollable area. Scroll, then use the captured image as a needle file to determine the top of the image to be scanned for keyword images.

Another possibility if you have Microsoft Office and therefore MODI OCR installed, read this article:

http://www.mjtnet.com/blog/2009/08/11/f ... with-modi/

Gale

cyberiguana · Post by **cyberiguana** » Thu May 20, 2010 3:03 pm

hi gale. I haven't quite understood your suggestion for the remembering problem. Can you elaborate?

Using ocr would require me to know which word I'm looking for, and I don't. I want to get all the names of products on the webpage regardless of their name. As I said to adroege, I submit each name in another page in the same site, get a result speciefic for that name and paste it in the price field of the product in the first page.

My worry is that doing so using web requests will require a great deal of html and javascript knowledge, as the site involved is full of js functions.

gdyvig · Post by **gdyvig** » Thu May 20, 2010 4:40 pm

Hi cyberiguana,

There are several things you may be wanting to remember.

My first thought was the last page of products and prices may contain information you have already scanned. You don't want to do it a second time. My suggestion was a method to detect you had reached the last page and how to find where you left off. But, that may not have been your real question.

You are scanning by an image of the word "Price:" so you can find the product name just above it. You can capture the name and store it in an array so you can remember it. Examples:
Product_1=Soap
Product_2=Toothpaste

You can use arrays with the same subscripts to keep remember corresponding prices, scrollpage#, and screen coordinates.

You can also use ScreenCapture to capture an image of the product name so you can scan for it to find where you left off.

As for the MODI solution, MODI keeps an array of every word detected by the OCR and its coordinates within a captured image. So you can find every occurance of "Price:" and work backwards to get the associated product name. RegEx would probably work on it.

While all of this should work, the approach suggested by adroege is likely to be more accurate and reliable if it has access to all the info you need.

Gale

cyberiguana · Post by **cyberiguana** » Thu May 20, 2010 5:36 pm

it looks like MODI works the same way as FindImagePos, except it uses a pic as a source. I wish I had a clue about those web requests, this imaging route is tedious.

getting text location

WebRecorder for HTML automation