Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accessing OuterHtml for unclosed elements throws a ArgumentOutOfRangeException #149

Closed
zsims opened this issue Feb 27, 2018 · 5 comments
Closed
Assignees

Comments

@zsims
Copy link

zsims commented Feb 27, 2018

Accessing OuterHtml (or InnerHtml) for an element that is unclosed will throw an ArgumentOutOfRangeException.

A small reproduction, I can confirm this fails with HtmlAgilityPack 1.4.9, 1.5.0 and 1.7.0:

using System;
using HtmlAgilityPack;

namespace html_agility_pack_repro
{
    class Program
    {
        static void Main(string[] args)
        {
            var document = new HtmlDocument();
            document.LoadHtml("<ul><li>item<span></li></ul>");
            var span = document.DocumentNode.SelectSingleNode("//span");
            if(span == null)
            {
                throw new Exception("Failed to find span element");
            }
            // Accessing OuterHtml throws: Unhandled Exception: System.ArgumentOutOfRangeException: Length cannot be less than zeroA
            // at System.String.Substring(Int32 startIndex, Int32 length).
            Console.WriteLine(span.OuterHtml);
        }
    }
}

The stack trace:

Unhandled Exception: System.ArgumentOutOfRangeException: Length cannot be less than zero.
Parameter name: length
   at System.String.Substring(Int32 startIndex, Int32 length)
   at html_agility_pack_repro.Program.Main(String[] args) in /home/zac/projects/html-agility-pack-repro/Program.cs:line 20

Some interesting properties of the span above:

_innerLength [int]: -19
_innerStartIndex [int]: 18
_outerLength [int]: -13
_outerStartIndex [int]: 12
@JonathanMagnan JonathanMagnan self-assigned this Feb 27, 2018
@JonathanMagnan
Copy link
Member

Hello @zsims ,

Thank you for reporting,

We will look at it.

Best Regards,

Jonathan

@JonathanMagnan
Copy link
Member

Hello @zsims ,

The v1.7.1 has been released.

This issue should be fixed

Best Regards,

Jonathan

@JonathanMagnan
Copy link
Member

Hello @zsims ,

This issue will be closed since it has been resolved.

Feel free to reopen it if you feel otherwise.

Best Regards,

Jonathan

@TheRealKraytonian
Copy link

TheRealKraytonian commented Jul 14, 2024

This issue is not closed I get it and can not solve it my self. It only happens when dealing with a list.

//C# Scraper
        try
        {
            _htmlDoc.LoadHtml(htmlString);
            
            var spans = _htmlDoc.DocumentNode.SelectNodes(
                  "//section[@class='container svelte-ezk9pj']"
            );
            var stockName = _htmlDoc.DocumentNode.SelectSingleNode("//h1[@class='svelte-3a2v0c']").InnerHtml;
            var stockPrice = _htmlDoc.DocumentNode.SelectSingleNode("//fin-streamer[@class='livePrice svelte-mgkamr']/span").InnerHtml;
            var stockChange = _htmlDoc.DocumentNode.SelectSingleNode("//fin-streamer[@class='priceChange svelte]/span").InnerHtml;
            var stockUnList =  _htmlDoc.DocumentNode.SelectNodes("//ul/li");

            foreach (var item in stockUnList)
            {
                Console.WriteLine(item.OuterHtml);
            }
            //var latest_pattern = string.Empty;
            //var stockOneYearTarget = _htmlDoc.DocumentNode.SelectSingleNode("//html/body/div[1]/main/section/section/section/article/div[2]/ul/li[16]/span[2]/fin-streamer").InnerHtml;

            foreach (var span in spans)
            {
                string htmlSpan = $"{stockName}: {stockPrice}, {stockChange}";

                Console.WriteLine(htmlSpan);

               // i = i + 1;
            }
        }
        catch (Exception ex)
        {
            Console.WriteLine(ex.Message);
        }
//HTML Code I Want to Scrape: https://finance.yahoo.com/quote/AMZN/
<ul style="--rows-sm: 8; --rows-md: 6; --rows: 4; --max-col: 4" class="svelte-tx3nkj">
   <li class=" svelte-tx3nkj">
      <span class="label svelte-tx3nkj">Previous Close</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="195.05" data-trend="none" active="" data-field="regularMarketPreviousClose" class="svelte-tx3nkj">195.05</fin-streamer>
      </span>
   </li>
   <li class=" svelte-tx3nkj">
      <span class="label svelte-tx3nkj">Open</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="194.51" data-trend="none" active="" data-field="regularMarketOpen" class="svelte-tx3nkj">194.51</fin-streamer>
      </span>
   </li>
   <li class=" svelte-tx3nkj"><span class="label svelte-tx3nkj">Bid</span> <span class="value svelte-tx3nkj">194.13 x 100</span> </li>
   <li class="last-lg svelte-tx3nkj"><span class="label svelte-tx3nkj">Ask</span> <span class="value svelte-tx3nkj">194.27 x 100</span> </li>
   <li class=" svelte-tx3nkj">
      <span class="label svelte-tx3nkj">Day's Range</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="193.83 - 196.47" data-trend="none" active="" data-field="regularMarketDayRange" class="svelte-tx3nkj">193.83 - 196.47</fin-streamer>
      </span>
   </li>
   <li class="last-md svelte-tx3nkj">
      <span class="label svelte-tx3nkj">52 Week Range</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="118.35 - 201.20" data-trend="none" active="" data-field="fiftyTwoWeekRange" class="svelte-tx3nkj">118.35 - 201.20</fin-streamer>
      </span>
   </li>
   <li class=" svelte-tx3nkj">
      <span class="label svelte-tx3nkj">Volume</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="29,759,483" data-trend="none" active="" data-dfield="longFmt" data-field="regularMarketVolume" class="svelte-tx3nkj">29,759,483</fin-streamer>
      </span>
   </li>
   <li class="last-sm last-lg svelte-tx3nkj">
      <span class="label svelte-tx3nkj">Avg. Volume</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="41,762,956" data-trend="none" active="" data-field="averageVolume" class="svelte-tx3nkj">41,762,956</fin-streamer>
      </span>
   </li>
   <li class=" svelte-tx3nkj">
      <span class="label svelte-tx3nkj">Market Cap (intraday)</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="2.024T" data-trend="none" active="" data-field="marketCap" class="svelte-tx3nkj">2.024T</fin-streamer>
      </span>
   </li>
   <li class=" svelte-tx3nkj"><span class="label svelte-tx3nkj">Beta (5Y Monthly)</span> <span class="value svelte-tx3nkj">1.15</span> </li>
   <li class=" svelte-tx3nkj">
      <span class="label svelte-tx3nkj">PE Ratio (TTM)</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="54.33" data-trend="none" active="" data-field="trailingPE" class="svelte-tx3nkj">54.33</fin-streamer>
      </span>
   </li>
   <li class="last-md last-lg svelte-tx3nkj">
      <span class="label svelte-tx3nkj">EPS (TTM)</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="3.58" data-trend="none" active="" data-field="trailingPE" class="svelte-tx3nkj">3.58</fin-streamer>
      </span>
   </li>
   <li class=" svelte-tx3nkj"><span class="label svelte-tx3nkj">Earnings Date</span> <span class="value svelte-tx3nkj">Aug 1, 2024 - Aug 5, 2024</span> </li>
   <li class=" svelte-tx3nkj"><span class="label svelte-tx3nkj">Forward Dividend &amp; Yield</span> <span class="value svelte-tx3nkj">--</span> </li>
   <li class=" svelte-tx3nkj"><span class="label svelte-tx3nkj">Ex-Dividend Date</span> <span class="value svelte-tx3nkj">--</span> </li>
   <li class="last-sm last-lg svelte-tx3nkj">
      <span class="label svelte-tx3nkj">1y Target Est</span> 
      <span class="value svelte-tx3nkj">
         <fin-streamer data-symbol="AMZN" data-value="208.25" data-trend="none" active="" data-field="targetMeanPrice" class="svelte-tx3nkj">208.25</fin-streamer>
      </span>
   </li>
</ul>

Error Message below
image

I did not want to create a new ticket

@JonathanMagnan
Copy link
Member

Hello @TheRealKraytonian ,

Could you give us a runnable project that reproduces the issue? Either on Fiddle or in a new project that only contains the current issue.

I tried to reproduce it, but your HTML is not complete: https://dotnetfiddle.net/vEMlFU

I would prefer to look at this more when we can reproduce it. When I fixed your code to make it work, I didn't get any error, so we surely missed something.

Btw, in case you are not aware of this, for the stock market, you should use YahooFinanceAPI library such as OoplesFinance.YahooFinanceAPI to get data instead of trying to parse the page with HAP. But maybe you have some specific purpose.

Best Regards,

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants