IE_ExtractTag access violation module ntdll

Technical support and scripting issues

Moderators: Dorian (MJT support), JRL

Post Reply
Optimus
Newbie
Posts: 12
Joined: Wed Nov 22, 2006 12:25 pm
Location: Australia

IE_ExtractTag access violation module ntdll

Post by Optimus » Thu Mar 11, 2010 7:29 am

Hi,

I have a script (v11.1) that trolls through a Websphere-developed site
and extracts data from a HTML table. The script has been running fine
for about a year but in the last 24 hours has started failing with
this error:

Access Violation at xxx in module ntdll.dll. Read of address
yyy.

I have also tried it on a different machine (both Windows XP SP3) with the
latest version v11.1.22 and found the same behaviour. I'm guessing
something has changed in the HTML input that is causing the script to barf.

Using the debugger, the error seems to occur when using IE_ExtractTag. I have wrapped this function in a subroutine as follows, where
Get_Cell_Text_Var_1 is the cell number I wish to capture:

Code: Select all

SRT>Get_Cell_Text
  IE_ExtractTag>%IE[0]%,,TD,Get_Cell_Text_Var_1,0,cell_text,r
END>Get_Cell_Text 
Four cells from the Websphere-generated HTML table are included
below. The first and second TD text values are captured correctly.
The attempt to capture the third cell actually returns the value from
the second cell. The call to get the fourth cell results in the
access violation.
12/03/10 09:00:00

131091

Flyer - 1Pp Or 2Pp

CLK3 offset colour between 1,000 and 10,000
Interestingly, I've also looked at the HTML table with webrecorder and
the tag extraction tool, and they seem to identify the TD values
unambiguously.

Any ideas?

Thanks.

Optimus
Newbie
Posts: 12
Joined: Wed Nov 22, 2006 12:25 pm
Location: Australia

IE_ExtractTag string or buffer limit?

Post by Optimus » Thu Mar 11, 2010 11:38 pm

Can anyone confirm if there is a string or buffer limit when working with IE_ExtractTag? If so, is there a way to increase it?

Looking at the second table cell from my previous post, and comparing it with a similar instance from about a year ago, I calculated the new cell is 5 characters longer.

New cell value: 849 characters (923 if TD tags included)
Old cell value: 844 characters (918 if TD tags included)

I have no idea if this is the cause of the memory violation.

User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 12, 2010 8:50 am

Set the buffer size like this:

Let>cell_text_SIZE=1024
IE_ExtractTag>%IE[0]%,,TD,Get_Cell_Text_Var_1,0,cell_text,r

That will set the buffer size to 1024 characters.

I can't see your HTML source properly as it has messed up the forum. Could you try again, this time placing it inside [code] .... [/code] tags (use the Code button) and disable HTML in the post.
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Optimus
Newbie
Posts: 12
Joined: Wed Nov 22, 2006 12:25 pm
Location: Australia

Sample HTML

Post by Optimus » Fri Mar 12, 2010 1:16 pm

Sample of HTML input posted again, as requested.

Code: Select all

<TD name="ColumnData" valign="top" width="200" align="Center"><SPAN name="RFQCloseDate" class="outputData">12/03/10 09:00:00</SPAN><SPAN name="RFQCloseDate_ValidationError" class="ValidationErrorText"></SPAN></TD>

<TD name="ColumnData" valign="top" width="200" align="Center"><a name="RFQNumber" class="outputData" href="RFQ_Details_Method" onclick="var _f=(this.form || _bst_locateForm_Supplier_00215SQM_00515Model_005151274bd36ce6_005151842());var _els = _f.elements; _els["RfqNo"].value = "131091"; _f.action = "/wps/myportal/streamsolsportal/!ut/p/c1/04_SB8K8xLLM9MSSzPy8xBz9CP0os_ggZx9HCydDRwMLM1MXAyMXA8sgIydHY3cnY6B8JJK8QZi_uYGRqYGTi0GQs7GXkwkB3eEg-_DrB8kb4ACOBvp-Hvm5qfoFuREGWSaOigDuPdir/dl2/d1/L0lDU0NTSUpKZ2tLQ2xFQSEvb01vUUFBSVFKQUFNWXhpbE1RWndYQk00L1lCSkp3NDU0NTAtNUY0a3N0eWp3LzdfNVJPUUFCMUEwTzhHMDAyQlFQTDJGUDA0SjcvYUE0WUYxMDUvYmZfYWN0aW9uL19nZW5fY2FsbF8xX1JGUV9EZXRhaWxzX01ldGhvZA!!/#7_5ROQAB1A0O8G002BQPL2FP04J7";_f.target = "_self";if (!_f.onsubmit || _f.onsubmit()) _f.submit();return false">131091</a><SPAN name="RFQNumber_ValidationError" class="ValidationErrorText"></SPAN></TD>

<TD name="ColumnData" valign="top" width="200" align="Center"><SPAN name="RFQTitle" class="outputData">Flyer - 1Pp Or 2Pp</SPAN><SPAN name="RFQTitle_ValidationError" class="ValidationErrorText"></SPAN></TD>

<TD name="ColumnData" valign="top" width="200" align="Center"><SPAN name="ProductTypeDesc" class="outputData">CLK3 offset colour between 1,000 and 10,000</SPAN><SPAN name="ProductTypeDesc_ValidationError" class="ValidationErrorText"></SPAN></TD>


User avatar
Marcus Tettmar
Site Admin
Posts: 7380
Joined: Thu Sep 19, 2002 3:00 pm
Location: Dorset, UK
Contact:

Post by Marcus Tettmar » Fri Mar 12, 2010 1:44 pm

When I click on each item with the tag extractor I get SPAN elements returned. I think you should be using SPAN rather than TD. Each TD contains a SPAN anyway and has no text because the text is within the SPAN. So either you use SPAN or set extracttag to return all HTML and then parse - may as well go with SPAN.

The last "cell" in the table gives me:

Let>SPAN5_SIZE=4098
IE_ExtractTag>%IE[0]%,,SPAN,5,0,SPAN5,r
MidStr>r_6,1,r,SPAN5

Span 0 contains: 12/03/10 09:00:00
Span 1 is empty
Span 2 contains: link 131091
Span 3 contains: Flyer - 1Pp Or 2Pp
Span 4 is empty
Span 5 contains: CLK3 offset colour between 1,000 and 10,000
Marcus Tettmar
http://mjtnet.com/blog/ | http://twitter.com/marcustettmar

Did you know we are now offering affordable monthly subscriptions for Macro Scheduler Standard?

Optimus
Newbie
Posts: 12
Joined: Wed Nov 22, 2006 12:25 pm
Location: Australia

Tag Extractor

Post by Optimus » Fri Mar 12, 2010 2:41 pm

I agree that when you use the tag extractor and click on each item you get the SPAN element returned except for the second cell. I can't test this right now but the link in the second cell is NOT within a SPAN (I can't remember what it returns, probably TD). However, if you click in the white space around each item (but within its cell) you get a TD element. From a coding perspective, this seemed less obscure. Iterating through the SPAN elements should work but it means you have to filter out the (hidden) empty SPAN elements used for validation which means extra coding.

I could rework my code to parse the SPAN elements but how can I be sure I won't run into the same memory problem? I just don't feel I've nailed the cause of the problem yet. Remember I've been successfully parsing on TD for nearly a year now. IE_ExtractTag is essentially a black box and I cannot see what it is doing.

Starting to ramble ... I'm going to go away now and have a think of what has been put forward.

Thanks.

Optimus
Newbie
Posts: 12
Joined: Wed Nov 22, 2006 12:25 pm
Location: Australia

Initialise variables!

Post by Optimus » Wed Mar 17, 2010 3:24 am

My memory error mysteriously disappeared at some stage but I was still not able to read past the first link. I guess you take what you get when dealing with a 3rd party site.

Following from Marcus' suggestion to use SPAN, I modified my code to extract the link (A) when trying to parse the second cell only.

Subsequent attempts to read later links in the table simply returned the first link text value.

In addition to the size assignment Marcus suggested earlier, he also said to set the return value to nothing before using. This fixed the parsing problem and I've been running my script successfully for more than 24 hours now.

Code: Select all

SRT>Get_Link_Text
  Let>link_text_SIZE=4098
  Let>link_text=
  IE_ExtractTag>%IE[0]%,,A,Get_Link_Text_Var_1,0,link_text,r
END>Get_Link_Text
Thanks to Marcus again.

Post Reply
Sign up to our newsletter for free automation tips, tricks & discounts