Tuesday, 1 October 2013

c++ LibCURL Get a page's "full source"

c++ LibCURL Get a page's "full source"

A quick question on LibCURL with c++. I've got LibCURL getting page's
source from the web, going through it and picking data out.
Everything is working great bar one page. I had this problem during
offline testing while using ifstream and the page source saved to a .html
file. basically what's happening i think is the web page renders html +
data, the parts i want through js calls (not 100% sure of this) so its not
directly rendered in the source.
How i got around this in offline testing was to download the full web page
as a offline mode file on saffari, i believe it was called a .webarchive
file? this way when i viewed it as source code the html and data was
rendered in the source.
I've trolled the internet for an answer but can't seem to find one, can
anyone help me here on a setting in curl to download the webpage in its
"fullness"?
Here is what options i use currently.
curl_easy_setopt(this->curl, CURLOPT_URL, url);
curl_easy_setopt(this->curl, CURLOPT_FOLLOWLOCATION, 1);
curl_easy_setopt(this->curl, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh;
Intel Mac OS X 10.8; rv:24.0) Gecko/20100101 Firefox/24.0");
curl_easy_setopt(this->curl, CURLOPT_COOKIEFILE, "cookies.txt");
curl_easy_setopt(this->curl, CURLOPT_COOKIEJAR, "cookies.txt");
curl_easy_setopt(this->curl, CURLOPT_POSTFIELDS, postData); // if needed
curl_easy_setopt(this->curl, CURLOPT_WRITEFUNCTION, this->WriteCallback);
curl_easy_setopt(this->curl, CURLOPT_WRITEDATA, &readBuffer);
res = curl_easy_perform(this->curl);
Thanks in advance for your time! Regards, Matt

No comments:

Post a Comment