Getting HTML from web pages that use AJAX

I wanted to know how to scrape web pages that use AJAX to fetch content on the web page being rendered. Typically a HTTP GET for such pages will just fetch the HTML page with the JavaScript code embedded in it. But I want to know if it is possible to programmatically (preferably Java) query for such pages and simulate a web browser kind of a request so that I get the HTML content resulting after the AJAX calls.

Answers


You may want to look at htmlunit


In The Productive Programmer author Neal Ford suggests that the functional testing tool Selenium can be used for non-testing tasks. Your task of inspecting HTML after client side DOM manipulation has taken place falls into this category. Selenium even allows you to automate interactions with the browser so if you need some buttons clicked to fire some AJAX events, you can script it. Selenium works by using a browser plugin and a java based server. Selenium test code (or non-test code in your case) can be written in a variety of languages including java, C# and other .Net languages, php, perl, python and ruby.


Why choose when you can have both? TestPlan supports both Selenium and HTMLUnit as a backend. Plus it has a really simple language for doing the most common tasks (extensions can be written in Java if need be -- which is rare actually).


Need Your Help

Xcode 5 valid signing identity not found

ios xcode

While I appreciate there's a hundred posts on here relating to this topic, after having read through those posts and nothing working for me, I've got no option but to start my own thread.

getElementsByClassName returns undefined even though the element is in the DOM

javascript dom

I am trying to fetch some elements from the DOM of the page using getElementsByClassName.