How do I pick numbers off a webpage?

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
Kwhiz
Junior Coder
Posts: 35
Joined: Wed Jan 12, 2005 6:19 pm

How do I pick numbers off a webpage?

Post by Kwhiz » Tue Feb 14, 2006 5:02 pm

I'm a stock trader, and I would like to take the current earnings estimates from the following Yahoo research page

http://finance.yahoo.com/q/ae?s=ups

and paste these numbers directly onto my stock charts. I know how to do the pasting, of course, but I can't figure out how to get the numbers off the webpage.

Any help?

Thanks so much,
Kwhiz

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Tue Feb 14, 2006 6:16 pm

The easiest way is to use WebRecorder and the ExtractTag function. You can use WebRecorders Tag Extraction Wizard to find the cells.

However, there's a problem with this on finance.yahoo.com - those nasty adverts cause the HTML to be dynamic - each time the page loads there is a different advert, sometimes not an advert and these adverts have their own HTML. That results in making the number of table elements dynamic and the start position of the data tables is therefore dynamic and cannot be known in advance.

The solution is to search through the table elements looking for the start of the table. Once you find the start of the table you know the starting position and can work out the position of the data elements. I hope this is making sense. Here is some code which hopefully explains it. This code finds the start of the table and retrieves the first two values. It simply displays them in a couple of message boxes.


// Generated by MacroScript WebRecorder 1.68
// Recorded on Tuesday, February 14, 2006, at 05:55 PM
LibLoad>IEAuto.dll,hIE
If>hIE=0
MessageModal>Could not load IEAuto.dll, make sure it is in the path or edit the LibLoad line.
Goto>end_script
EndIf

//Move the mouse cursor out of harm's way to avoid causing mouseover events to interrupt
MouseMove>0,0
Let>delay=1

LibFunc>hIE,CreateIE,IE[0],0

LibFunc>hIE,Navigate,r,%IE[0]%,http://finance.yahoo.com/q/ae?s=UPS
LibFunc>hIE,WaitIE,r,%IE[0]%
Wait>delay

//Find start of "Earnings Est" table
Let>ndx=30
Label>FindEELoop
Let>ndx=ndx+1
//Modify buffer size if required ...
Let>CELL_SIZE=4098
LibFunc>hIE,ExtractTag,r,%IE[0]%,,TD,ndx,0,CELL
MidStr>r_6,1,r,CELL
If>CELL=Earnings Est,FoundEarnings
Goto>FindEELoop
Label>FoundEarnings

//We've found the start of the table. The "Earnings Est" title is in cell ndx
//You can count on from the cell that says "Earnings Est". Count from there to
//the cell value you want. Add that value to ndx and that is the cell index

//Get the first data element (Current Qtr) - it is in ndx+6
Let>Item1=ndx+6
Let>CELL_SIZE=4098
LibFunc>hIE,ExtractTag,r,%IE[0]%,,TD,Item1,0,CELL
MidStr>r_6,1,r,CELL

//Display it in a message box
MessageModal>Avg Est, Current Qtr: %CELL%

//Get the second data element (Next Qtr) - ndx+6 (or last+1)
Let>Item2=Item1+1
Let>CELL_SIZE=4098
LibFunc>hIE,ExtractTag,r,%IE[0]%,,TD,Item2,0,CELL
MidStr>r_6,1,r,CELL

//Display it in a message box
MessageModal>Avf Est, Next Qtr: %CELL%

LibFree>hIE
Label>end_script


You should be able to make sense of this code sufficiently to be able to add the code you want to get the data you are interested in. Just a case of copying the ExtractTag lines and modifying the index accordingly - working it out from the discovered start position.

You will need WebRecorder for this script to work.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Tue Feb 14, 2006 6:19 pm

The other way to do it without WebRecorder is to use HTTPRequest and parse the retrieved HTML using string manipulation. Or use VBScript to control IE and parse the HTML that way. This is a more advanced approach that will require more code.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Kwhiz
Junior Coder
Posts: 35
Joined: Wed Jan 12, 2005 6:19 pm

Post by Kwhiz » Tue Feb 14, 2006 7:49 pm

I'm new to the HTTPrequest command, so I'm doing my first experimentation with it. Here is the short program I wrote, which gives me a "404 error connecting to host" message when I use the msg>stuff command, even though I can connect to the website with IE quickly and easily via cable broadband.

httprequest>http://finance.yahoo.com/q/ae?s=c,,get,,stuff
wait>13
msg>stuff


What am I doing wrong, if anything? I've tried waiting 30 seconds to no avail.
Thanks,
Kwhiz

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Tue Feb 14, 2006 10:47 pm

Maybe you use a proxy server? Check Internet Options under Tools in Internet Explorer. Check under the Connections tab. You may have a proxy server set up which you need to configure in Macro Scheduler too if the HTTPRequest command is going to work.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Kwhiz
Junior Coder
Posts: 35
Joined: Wed Jan 12, 2005 6:19 pm

Post by Kwhiz » Wed Feb 15, 2006 12:42 pm

Would LAN settings matter?

Under tools/internet options/connections/settings, none of those three boxes are checked, which tells me that I don't have any proxy settings.

But under tools/internet options/connections/LAN settings, the first box is checked which says "Automatically configure settings". I'm on a home network with three other computers. I don't think I would know how to configure these LAN settings manually, nor do I know if these settings have anything to do with the proxy settings you are referring to??

Thanks again for your help,
Kwhiz

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Wed Feb 15, 2006 5:53 pm

Hi,

Sorry, I've just realised, the problem is with your syntax. GET must be in upper case. It can be either GET or POST. Try this:

HttpRequest>http://finance.yahoo.com/q/ae?s=c,,GET,,stuff
MessageModal>stuff
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Kwhiz
Junior Coder
Posts: 35
Joined: Wed Jan 12, 2005 6:19 pm

Post by Kwhiz » Wed Feb 15, 2006 6:56 pm

Great, thanks, it works now! Very helpful. Thank you.

Kwhiz

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts