Creating a Web Page Scraper in C#

bulletproof · 2025-02-27T04:24:40+0000

Introduction:
This tutorial will teach you how to make a web scraper in C#, .NET framework.

Theory:
Here are the steps we will follow;
Get webpage source
Disect source
Output results

Getting the Source:
So first we need to get the web page source. Our target URL is going to be the home page of sourcecodester.com. First we create a basic HTTPWebRequest to the site, we then receive the response, and read it to a string which we return to the calling location of the function...

static

string

getSource(

)
{
HttpWebRequest req =

(

HttpWebRequest)

WebRequest.

Create

(

"http://www.sourcecodester.com/"

)

;
req.

UserAgent

=

"curl"

;

// this simulate curl linux command
req.

Method

=

"GET"

;
HttpWebResponse res =

(

HttpWebResponse)

req.

GetResponse

(

)

;
req =

null

;
return

new

StreamReader(

res.

GetResponseStream

(

)

)

.

ReadToEnd

(

)

;
}

Disectting the Source:
Now that we have the source, we want to disect. As a side note; here is what the main function where we are calling everything from looks like...

static

void

Main(

string

[

]

args)

{
string

src =

getSource(

)

;
}

So first we want to look for patterns in the source. You can either save the webpage in your page and open the saved documents in a text editor on your PC, or you can use a file stream to save the httpresponse from our program.

Looking at the source, we can see that all the articles are surrounded by divs with the class of ''. About three classes in to the div we can see that the one I have selected is a 'node-book', there are other types such as 'source-code' so we are going to use the classes that are used in all the articles only;
"Outputting the Results:
All done, now we can simply output the resulting containers...

foreach

(

string

s in

articles)

{
Console.

WriteLine

(

s)

;
}

Of course, this was just a simple demonstration; we could then disect the information further and extract the titles and other pieces of information from the divs.

Finished!

Creating a Web Page Scraper in C#

More options

bulletproof

452,496

327,690

327,698