ingerul
Bug Squasher
2
MONTHS
2 2 MONTHS OF SERVICE
LEVEL 2
1000 XP
Introduction:
Welcome to a tutorial on how to make a visual basic program which will scrape between two given points from a given page and create a list of output.
Pre-Creation:
My form will have:
Textbox1 Extract From
Textbox2 Extract To
Textbox3 Page to extract from
Button1 Begin extraction
Steps of Creation:
Step 1:
First we want some imports and a function. The function will enable us to scrape the data between the two given points.
Step 2:
Next we want to create the code to begin the process. First we check that all forms are filled out and if they are we produce a SaveFileDialog to select a save path as .txt.
Step 3:
Following initialization of the forms and save path, we get the source code of the page url, extract the data and save it to the save path. (Below is the full button code).
Project Complete!
Below is the full source code along with download of the files.
Download
Welcome to a tutorial on how to make a visual basic program which will scrape between two given points from a given page and create a list of output.
Pre-Creation:
My form will have:
Textbox1 Extract From
Textbox2 Extract To
Textbox3 Page to extract from
Button1 Begin extraction
Steps of Creation:
Step 1:
First we want some imports and a function. The function will enable us to scrape the data between the two given points.
- Imports System.IO
- Imports System.Text.RegularExpressions
- Imports System.Net
- Private
Function
GetBetweenAll(ByVal
Source As
String
, ByVal
Str1 As
String
, ByVal
Str2 As
String
) As
String
()
- Dim
Results, T As
New
List(Of String
)
- T.AddRange(Regex.Split(Source, Str1))
- T.RemoveAt(0)
- For
Each
I As
String
In
T
- Results.Add(Regex.Split(I, Str2)(0))
- Next
- Return Results.ToArray
- End
Function
Step 2:
Next we want to create the code to begin the process. First we check that all forms are filled out and if they are we produce a SaveFileDialog to select a save path as .txt.
- If
(Not
TextBox1.Text = Nothing
And
Not
TextBox2.Text = Nothing
And
Not
TextBox3.Text = Nothing
) Then
- Dim
fo As
New
SaveFileDialog
- fo.Filter = "Text Files|*.txt"
- fo.FilterIndex = 1
- fo.Title = "Save Path"
- fo.ShowDialog()
- If
(Not
fo.FileName = Nothing
) Then
- End
If
- End
If
Step 3:
Following initialization of the forms and save path, we get the source code of the page url, extract the data and save it to the save path. (Below is the full button code).
- Private
Sub
Button1_Click(sender As
Object
, e As
EventArgs) Handles Button1.Click
- If
(Not
TextBox1.Text = Nothing
And
Not
TextBox2.Text = Nothing
And
Not
TextBox3.Text = Nothing
) Then
- Dim
fo As
New
SaveFileDialog
- fo.Filter = "Text Files|*.txt"
- fo.FilterIndex = 1
- fo.Title = "Save Path"
- fo.ShowDialog()
- If
(Not
fo.FileName = Nothing
) Then
- Dim
r As
HttpWebRequest = HttpWebRequest.Create(TextBox3.Text)
- Dim
re As
HttpWebResponse = r.GetResponse()
- Dim
src As
String
= New
StreamReader(re.GetResponseStream()).ReadToEnd()
- Dim
srcs As
String
() = getbetweenall(src, TextBox1.Text, TextBox2.Text)
- Using sw As
New
StreamWriter(fo.FileName)
- For
Each
s As
String
In
srcs
- sw.WriteLine(s)
- Next
- End
Using
- End
If
- End
If
- End
Sub
Project Complete!
Below is the full source code along with download of the files.
- Imports System.IO
- Imports System.Text.RegularExpressions
- Imports System.Net
- Public
Class Form1
- Private
Function
GetBetweenAll(ByVal
Source As
String
, ByVal
Str1 As
String
, ByVal
Str2 As
String
) As
String
()
- Dim
Results, T As
New
List(Of String
)
- T.AddRange(Regex.Split(Source, Str1))
- T.RemoveAt(0)
- For
Each
I As
String
In
T
- Results.Add(Regex.Split(I, Str2)(0))
- Next
- Return Results.ToArray
- End
Function
- Private
Sub
Button1_Click(sender As
Object
, e As
EventArgs) Handles Button1.Click
- If
(Not
TextBox1.Text = Nothing
And
Not
TextBox2.Text = Nothing
And
Not
TextBox3.Text = Nothing
) Then
- Dim
fo As
New
SaveFileDialog
- fo.Filter = "Text Files|*.txt"
- fo.FilterIndex = 1
- fo.Title = "Save Path"
- fo.ShowDialog()
- If
(Not
fo.FileName = Nothing
) Then
- Dim
r As
HttpWebRequest = HttpWebRequest.Create(TextBox3.Text)
- Dim
re As
HttpWebResponse = r.GetResponse()
- Dim
src As
String
= New
StreamReader(re.GetResponseStream()).ReadToEnd()
- Dim
srcs As
String
() = getbetweenall(src, TextBox1.Text, TextBox2.Text)
- Using sw As
New
StreamWriter(fo.FileName)
- For
Each
s As
String
In
srcs
- sw.WriteLine(s)
- Next
- End
Using
- End
If
- End
If
- End
Sub
- End
Class
Download
You must upgrade your account or reply in the thread to view the hidden content.