• We just launched and are currently in beta. Join us as we build and grow the community.

Visual Basic Email Extractor

AhmedEvil

Robotic Systems Architect
Divine
A Rep
0
0
0
Rep
0
A Vouches
0
0
0
Vouches
0
Posts
198
Likes
160
Bits
2 MONTHS
2 2 MONTHS OF SERVICE
LEVEL 1 200 XP
Introduction:
Today we will be making a simple program to extract all emails within a web page source. We will be doing it manually rather than using Regex.

Notes:
This will require a webpage which contains one or more emails. I will be using this list for testing: http://pastebin.com/KBzSZVgh

Steps of Creation:
Step 1:
First we are going to create a new project with:
1 Text-boxes - URL
1 Button - Begin new thread (which will begin the extractor script)
1 Listbox - Contain emails

Step 2:
Next we need to Import four packages. One for creating a request and receiving a response to and from the web page, another to read the response and the final one to create a new thread and the last for Regex.

  1. Imports System.IO
  2. Imports System.Net
  3. Imports System.Text.RegularExpressions
  4. Imports System.Threading

Step 3:
The next step we want to add a function which we will use to get the text between all the HTML tags. We will be splitting the source by a space to get all the words. Then check each word if it contains appropriate signs ("@", ".") and if it does it may contain tags so we need to remove them.
  1. Private

    Function

    GetBetweenAll(ByVal

    Source As

    String

    , ByVal

    Str1 As

    String

    , ByVal

    Str2 As

    String

    ) As

    String

    ()
  2. Dim

    Results, T As

    New

    List(Of String

    )
  3. T.AddRange(Regex.Split(Source, Str1))
  4. T.RemoveAt(0)
  5. For

    Each

    I As

    String

    In

    T
  6. Results.Add(Regex.Split(I, Str2)(0))
  7. Next
  8. Return Results.ToArray
  9. End

    Function

Step 4:
In the button one click event we are going to create a new thread which will start a function named "extract".

  1. Private

    Sub

    Button1_Click(sender As

    Object

    , e As

    EventArgs) Handles Button1.Click
  2. Dim

    trd As

    thread = New

    thread(AddressOf extract)
  3. trd.isbackground = True
  4. trd.start()
  5. End

    Sub

Step 5:
Now lets check the URL in textbox1 to see if it is valid, if it is lets create a request, receive the response and read it to gain the web page source.

  1. Private

    Function

    extract()
  2. If

    (TextBox1.Text.StartsWith("http://"

    ) Or

    TextBox1.Text.StartsWith("https://"

    )) Then
  3. Dim

    r As

    HttpWebRequest = HttpWebRequest.Create(TextBox1.Text)
  4. r.KeepAlive = True
  5. r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
  6. Dim

    re As

    HttpWebResponse = r.GetResponse()
  7. Dim

    src As

    String

    = New

    StreamReader(re.GetResponseStream()).ReadToEnd()
  8. Else

    : MsgBox("That is not a valid link!"

    )
  9. End

    If
  10. End

    Function

Step 6:
Once we have got the source lets split it by a space to get each "word"/tag and check to see if each could be an email by looking for the "@" and "." signs.

  1. Private

    Function

    extract()
  2. If

    (TextBox1.Text.StartsWith("http://"

    ) Or

    TextBox1.Text.StartsWith("https://"

    )) Then
  3. Dim

    r As

    HttpWebRequest = HttpWebRequest.Create(TextBox1.Text)
  4. r.KeepAlive = True
  5. r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
  6. Dim

    re As

    HttpWebResponse = r.GetResponse()
  7. Dim

    src As

    String

    = New

    StreamReader(re.GetResponseStream()).ReadToEnd()
  8. Dim

    words As

    String

    () = src.Split(" "

    )
  9. For

    Each

    word As

    String

    In

    words
  10. If

    (word.Contains("@"

    ) And

    word.Contains("."

    )) Then
  11. End

    If
  12. Next
  13. Else

    : MsgBox("That is not a valid link!"

    )
  14. End

    If
  15. End

    Function

Step 7:
Now lets check to see if that particular word contains a "" and ">" which means it could be a tag. We don't want the tags so lets get the String between ">" and "".

  1. Private

    Function

    extract()
  2. If

    (TextBox1.Text.StartsWith("http://"

    ) Or

    TextBox1.Text.StartsWith("https://"

    )) Then
  3. Dim

    r As

    HttpWebRequest = HttpWebRequest.Create(TextBox1.Text)
  4. r.KeepAlive = True
  5. r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
  6. Dim

    re As

    HttpWebResponse = r.GetResponse()
  7. Dim

    src As

    String

    = New

    StreamReader(re.GetResponseStream()).ReadToEnd()
  8. Dim

    words As

    String

    () = src.Split(" "

    )
  9. For

    Each

    word As

    String

    In

    words
  10. If

    (word.Contains("@"

    ) And

    word.Contains("."

    )) Then
  11. If

    (word.Contains("<"

    ) And

    word.Contains(">"

    )) Then
  12. Dim

    toAdd As

    New

    List(Of String

    )
  13. Dim

    noTags As

    String

    () = GetBetweenAll(word, ">"

    , "<"

    )
  14. Else
  15. ListBox1.Items.Add(word)
  16. End

    If
  17. End

    If
  18. Next
  19. Else

    : MsgBox("That is not a valid link!"

    )
  20. End

    If
  21. End

    Function

Step 8:
Finally, lets check how many emails there are. If there are more than one in the "toAdd" List, lets just add it to our listbox, otherwise lets iterate through them all and add them all to the listbox.

  1. Private

    Function

    extract()
  2. If

    (TextBox1.Text.StartsWith("http://"

    ) Or

    TextBox1.Text.StartsWith("https://"

    )) Then
  3. Dim

    r As

    HttpWebRequest = HttpWebRequest.Create(TextBox1.Text)
  4. r.KeepAlive = True
  5. r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
  6. Dim

    re As

    HttpWebResponse = r.GetResponse()
  7. Dim

    src As

    String

    = New

    StreamReader(re.GetResponseStream()).ReadToEnd()
  8. Dim

    words As

    String

    () = src.Split(" "

    )
  9. For

    Each

    word As

    String

    In

    words
  10. If

    (word.Contains("@"

    ) And

    word.Contains("."

    )) Then
  11. If

    (word.Contains("<"

    ) And

    word.Contains(">"

    )) Then
  12. Dim

    toAdd As

    New

    List(Of String

    )
  13. Dim

    noTags As

    String

    () = GetBetweenAll(word, ">"

    , "<"

    )
  14. For

    Each

    w As

    String

    In

    noTags
  15. If

    (w.Contains("@"

    ) And

    w.Contains("."

    ) And

    Not

    w.Contains("="

    )) Then
  16. If

    (w.EndsWith(","

    ) Or

    w.EndsWith("."

    )) Then
  17. toAdd.Add(w.Substring(0, w.Length - 1))
  18. Else
  19. toAdd.Add(w)
  20. End

    If
  21. End

    If
  22. Next
  23. If

    (toAdd.Count > 0) Then
  24. If

    (toAdd.Count > 1) Then
  25. For

    Each

    t As

    String

    In

    toAdd
  26. ListBox1.Items.Add(t)
  27. Next
  28. Else
  29. ListBox1.Items.Add(toAdd(0))
  30. End

    If
  31. End

    If
  32. Else
  33. ListBox1.Items.Add(word)
  34. End

    If
  35. End

    If
  36. Next
  37. Else

    : MsgBox("That is not a valid link!"

    )
  38. End

    If
  39. End

    Function

Important!
To be able to access the controls from a new thread we need to set CheckForIllegalCrossThreadCalls to False in the Form1_load event.

Remove Duplicates Function
Lets add one last feature to remove all duplicate emails. This is very simple and just iterates through each item in the listbox1 and checks against a newly created list. If it is already in the list it won't add it otherwise it will. Then it simply iterates through the new list to add them all back in to the listbox1 (After clearing it first of course!)

Of course, you could add this as a new thread as well so it doesn't temporarily crash the UI while processing.

  1. Private

    Sub

    Button3_Click(sender As

    Object

    , e As

    EventArgs) Handles Button3.Click
  2. Dim

    items As

    New

    List(Of String

    )
  3. For

    Each

    i As

    String

    In

    ListBox1.Items
  4. Dim

    isNew As

    Boolean

    = True
  5. For

    Each

    it As

    String

    In

    items
  6. If

    (it = i) Then

    isNew = False
  7. Next
  8. If

    (isNew) Then

    items.Add(i)
  9. Next
  10. ListBox1.Items.Clear()
  11. For

    Each

    i As

    String

    In

    items
  12. ListBox1.Items.Add(i)
  13. Next
  14. End

    Sub

Project Completed!
That's it! Here is the finished source:

  1. Imports System.IO
  2. Imports System.Net
  3. Imports System.Text.RegularExpressions
  4. Imports System.Threading
  5. Public

    Class Form1

  6. Private

    Function

    GetBetweenAll(ByVal

    Source As

    String

    , ByVal

    Str1 As

    String

    , ByVal

    Str2 As

    String

    ) As

    String

    ()
  7. Dim

    Results, T As

    New

    List(Of String

    )
  8. T.AddRange(Regex.Split(Source, Str1))
  9. T.RemoveAt(0)
  10. For

    Each

    I As

    String

    In

    T
  11. Results.Add(Regex.Split(I, Str2)(0))
  12. Next
  13. Return Results.ToArray
  14. End

    Function

  15. Private

    Sub

    Button1_Click(sender As

    Object

    , e As

    EventArgs) Handles Button2.Click
  16. Dim

    trd As

    Thread = New

    Thread(AddressOf extract)
  17. trd.IsBackground = True
  18. trd.Start()
  19. End

    Sub

  20. Private

    Function

    extract()
  21. If

    (TextBox1.Text.StartsWith("http://"

    ) Or

    TextBox1.Text.StartsWith("https://"

    )) Then
  22. Dim

    r As

    HttpWebRequest = HttpWebRequest.Create(TextBox1.Text)
  23. r.KeepAlive = True
  24. r.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.2 Safari/537.36"
  25. Dim

    re As

    HttpWebResponse = r.GetResponse()
  26. Dim

    src As

    String

    = New

    StreamReader(re.GetResponseStream()).ReadToEnd()
  27. Dim

    words As

    String

    () = src.Split(" "

    )
  28. For

    Each

    word As

    String

    In

    words
  29. If

    (word.Contains("@"

    ) And

    word.Contains("."

    )) Then
  30. If

    (word.Contains("<"

    ) And

    word.Contains(">"

    )) Then
  31. Dim

    toAdd As

    New

    List(Of String

    )
  32. Dim

    noTags As

    String

    () = GetBetweenAll(word, ">"

    , "<"

    )
  33. For

    Each

    w As

    String

    In

    noTags
  34. If

    (w.Contains("@"

    ) And

    w.Contains("."

    ) And

    Not

    w.Contains("="

    )) Then
  35. If

    (w.EndsWith(","

    ) Or

    w.EndsWith("."

    )) Then
  36. toAdd.Add(w.Substring(0, w.Length - 1))
  37. Else
  38. toAdd.Add(w)
  39. End

    If
  40. End

    If
  41. Next
  42. If

    (toAdd.Count > 0) Then
  43. If

    (toAdd.Count > 1) Then
  44. For

    Each

    t As

    String

    In

    toAdd
  45. ListBox1.Items.Add(t)
  46. Next
  47. Else
  48. ListBox1.Items.Add(toAdd(0))
  49. End

    If
  50. End

    If
  51. Else
  52. ListBox1.Items.Add(word)
  53. End

    If
  54. End

    If
  55. Next
  56. Else

    : MsgBox("That is not a valid link!"

    )
  57. End

    If
  58. End

    Function

  59. Private

    Function

    removeTags(ByVal

    w As

    String

    )
  60. Dim

    toReturn As

    New

    List(Of String

    )
  61. Dim

    noTags As

    String

    () = GetBetweenAll(w, ">"

    , "<"

    )
  62. For

    Each

    word As

    String

    In

    noTags
  63. If

    (word.Contains("@"

    ) And

    word.Contains("."

    ) And

    Not

    word.Contains("="

    )) Then
  64. toReturn.Add(word)
  65. End

    If
  66. Next
  67. Return toReturn
  68. End

    Function

  69. Private

    Sub

    Form1_Load(sender As

    Object

    , e As

    EventArgs) Handles Me.Load
  70. CheckForIllegalCrossThreadCalls = False
  71. End

    Sub

  72. Private

    Sub

    Button3_Click(sender As

    Object

    , e As

    EventArgs) Handles Button3.Click
  73. Dim

    items As

    New

    List(Of String

    )
  74. For

    Each

    i As

    String

    In

    ListBox1.Items
  75. Dim

    isNew As

    Boolean

    = True
  76. For

    Each

    it As

    String

    In

    items
  77. If

    (it = i) Then

    isNew = False
  78. Next
  79. If

    (isNew) Then

    items.Add(i)
  80. Next
  81. ListBox1.Items.Clear()
  82. For

    Each

    i As

    String

    In

    items
  83. ListBox1.Items.Add(i)
  84. Next
  85. End

    Sub
  86. End

    Class


Download
You must upgrade your account or reply in the thread to view hidden text.
 

452,292

323,341

323,350

Top