• We just launched and are currently in beta. Join us as we build and grow the community.

Visual Basic Twitter Feed Scraper

alphanz

Tech Solutionist
A Rep
0
0
0
Rep
0
A Vouches
0
0
0
Vouches
0
Posts
107
Likes
80
Bits
2 MONTHS
2 2 MONTHS OF SERVICE
LEVEL 1 300 XP
Introduction:
Welcome to my tutorial on how to create a Twitter profile tweet scraper. First create a form which contains a textbox for the profile username and a button to begin the process.

Steps of Creation:
Step 1:
Import the following two imports so we can get the profile page source and manipulate it:

  1. Imports System.Net
  2. Imports System.Text.RegularExpressions

Step 2:
Now, add two functions; GetBetween and GetBetweenAll. We will be using these Regex functions to extract our tweets from our web page source.

  1. Private

    Function

    GetBetween(ByVal

    Source As

    String

    , ByVal

    Str1 As

    String

    , ByVal

    Str2 As

    String

    , Optional

    ByVal

    Index As

    Integer

    = 0) As

    String
  2. Return Regex.Split(Regex.Split(Source, Str1)(Index + 1), Str2)(0)
  3. End

    Function
  4. Private

    Function

    GetBetweenAll(ByVal

    Source As

    String

    , ByVal

    Str1 As

    String

    , ByVal

    Str2 As

    String

    ) As

    String

    ()
  5. Dim

    Results, T As

    New

    List(Of String

    )
  6. T.AddRange(Regex.Split(Source, Str1))
  7. T.RemoveAt(0)
  8. For

    Each

    I As

    String

    In

    T
  9. Results.Add(Regex.Split(I, Str2)(0))
  10. Next
  11. Return Results.ToArray
  12. End

    Function

Step 3:
On the button click event we are going to send a request to the profile page of the entered username in textbox1 and get the response (the source code once read):

  1. Private

    Sub

    Button1_Click(sender As

    Object

    , e As

    EventArgs) Handles Button1.Click
  2. Dim

    r As

    HttpWebRequest = HttpWebRequest.Create("http://www.twitter.com/"

    & textbox1.text)
  3. Dim

    re As

    HttpWebResponse = r.GetResponse()
  4. Dim

    src As

    String

    = New

    System.IO.StreamReader(re.GetResponseStream()).ReadToEnd()
  5. If

    (src = Nothing

    ) Then
  6. MsgBox("Error. Src is null"

    )
  7. Else
  8. Dim

    tweets As

    String

    () = getbetweenall(src, "<li class="

    "js-stream-item stream-item stream-item expanding-stream-item"

    " data-item-id="

    ""

    , "</div></div></li>"

    )
  9. If

    (tweets.Count > 0) Then
  10. Dim

    tweetcount As

    Integer

    = 0
  11. If

    (Not

    My.Computer.FileSystem.DirectoryExists(CurDir() & "/"

    & TextBox1.Text)) Then

    My.Computer.FileSystem.CreateDirectory(CurDir() & "/"

    & TextBox1.Text)
  12. For

    Each

    tweet As

    String

    In

    tweets
  13. Using sw As

    New

    System.IO.StreamWriter(CurDir() & "/"

    & TextBox1.Text & "/Tweet "

    & tweetcount & ".txt"

    )
  14. tweetcount += 1
  15. Dim

    msg As

    String

    = GetBetween(tweet, "<p class="

    "js-tweet-text tweet-text"

    ">"

    , "</p>"

    )
  16. msg = clearTags(msg)
  17. sw.Write(msg)
  18. End

    Using
  19. Next
  20. End

    If
  21. End

    If
  22. End

    Sub

Once we have read the source code of the page we are extracting all the loaded tweets using the GetBetweenAll function we already added. Then, as long as we have tweets, we are iterating through each one and writing the tweet to a text file in Current Directory > Profile Username > Tweet *TweetCount*.txt. Before we write the tweets we need to clean them of html tags...

Step 4:
Ok so now we have our tweets we need to clean them up so we aren't left with things like """ instead of a quotation mark ("). We are already running the "msg" through our clearTags function so lets create it:

  1. Private

    Function

    clearTags(ByVal

    s As

    String

    )
  2. If

    (s.Contains("<"

    ) And

    s.Contains(">"

    )) Then
  3. Dim

    toreturn As

    String

    = ""
  4. Dim

    shouldadd As

    Boolean

    = True
  5. For

    Each

    c As

    Char In

    s
  6. If

    (c = "<"

    ) Then

    shouldadd = False
  7. If

    (c = ">"

    ) Then

    shouldadd = True
  8. If

    (Not

    c = "<"

    And

    Not

    c = ">"

    And

    shouldadd) Then
  9. toreturn &= c
  10. End

    If
  11. Next
  12. If

    (toreturn.Contains("&#39;"

    )) Then
  13. toreturn = toreturn.Replace("&#39;"

    , "'"

    )
  14. End

    If
  15. If

    (toreturn.Contains("&nbsp;"

    )) Then
  16. toreturn = toreturn.Replace("&nbsp;"

    , " "

    )
  17. End

    If
  18. If

    (toreturn.Contains("&quot;"

    )) Then
  19. toreturn = toreturn.Replace("&quot;"

    , ""

    ""

    )
  20. End

    If
  21. Return toreturn
  22. Else
  23. Dim

    s2 As

    String

    = ""
  24. If

    (s2.Contains("&#39;"

    )) Then
  25. s2 = s2.Replace("&#39;"

    , "'"

    )
  26. End

    If
  27. If

    (s2.Contains("&nbsp;"

    )) Then
  28. s2 = s2.Replace("&nbsp;"

    , " "

    )
  29. End

    If
  30. If

    (s2.Contains("&quot;"

    )) Then
  31. s2 = s2.Replace("&quot;"

    , ""

    ""

    )
  32. End

    If
  33. Return s2
  34. End

    If
  35. End

    Function

Note: I might not of got all the replacements but these are the only ones I could see. If you see any more just add more replacements in the above script.

Project Complete!
Below you will find the complete source code along with a download the full project:

  1. Imports System.Net
  2. Imports System.Text.RegularExpressions
  3. Public

    Class Form1
  4. Private

    Function

    GetBetween(ByVal

    Source As

    String

    , ByVal

    Str1 As

    String

    , ByVal

    Str2 As

    String

    , Optional

    ByVal

    Index As

    Integer

    = 0) As

    String
  5. Return Regex.Split(Regex.Split(Source, Str1)(Index + 1), Str2)(0)
  6. End

    Function
  7. Private

    Function

    GetBetweenAll(ByVal

    Source As

    String

    , ByVal

    Str1 As

    String

    , ByVal

    Str2 As

    String

    ) As

    String

    ()
  8. Dim

    Results, T As

    New

    List(Of String

    )
  9. T.AddRange(Regex.Split(Source, Str1))
  10. T.RemoveAt(0)
  11. For

    Each

    I As

    String

    In

    T
  12. Results.Add(Regex.Split(I, Str2)(0))
  13. Next
  14. Return Results.ToArray
  15. End

    Function
  16. Private

    Sub

    Button1_Click(sender As

    Object

    , e As

    EventArgs) Handles Button1.Click
  17. Dim

    r As

    HttpWebRequest = HttpWebRequest.Create("http://www.twitter.com/"

    & textbox1.text)
  18. Dim

    re As

    HttpWebResponse = r.GetResponse()
  19. Dim

    src As

    String

    = New

    System.IO.StreamReader(re.GetResponseStream()).ReadToEnd()
  20. If

    (src = Nothing

    ) Then
  21. MsgBox("Error. Src is null"

    )
  22. Else
  23. Dim

    tweets As

    String

    () = getbetweenall(src, "<li class="

    "js-stream-item stream-item stream-item expanding-stream-item"

    " data-item-id="

    ""

    , "</div></div></li>"

    )
  24. If

    (tweets.Count > 0) Then
  25. Dim

    tweetcount As

    Integer

    = 0
  26. If

    (Not

    My.Computer.FileSystem.DirectoryExists(CurDir() & "/"

    & TextBox1.Text)) Then

    My.Computer.FileSystem.CreateDirectory(CurDir() & "/"

    & TextBox1.Text)
  27. For

    Each

    tweet As

    String

    In

    tweets
  28. Using sw As

    New

    System.IO.StreamWriter(CurDir() & "/"

    & TextBox1.Text & "/Tweet "

    & tweetcount & ".txt"

    )
  29. tweetcount += 1
  30. Dim

    msg As

    String

    = GetBetween(tweet, "<p class="

    "js-tweet-text tweet-text"

    ">"

    , "</p>"

    )
  31. msg = clearTags(msg)
  32. sw.Write(msg)
  33. End

    Using
  34. Next
  35. End

    If
  36. End

    If
  37. End

    Sub

  38. Private

    Function

    clearTags(ByVal

    s As

    String

    )
  39. If

    (s.Contains("<"

    ) And

    s.Contains(">"

    )) Then
  40. Dim

    toreturn As

    String

    = ""
  41. Dim

    shouldadd As

    Boolean

    = True
  42. For

    Each

    c As

    Char In

    s
  43. If

    (c = "<"

    ) Then

    shouldadd = False
  44. If

    (c = ">"

    ) Then

    shouldadd = True
  45. If

    (Not

    c = "<"

    And

    Not

    c = ">"

    And

    shouldadd) Then
  46. toreturn &= c
  47. End

    If
  48. Next
  49. If

    (toreturn.Contains("&#39;"

    )) Then
  50. toreturn = toreturn.Replace("&#39;"

    , "'"

    )
  51. End

    If
  52. If

    (toreturn.Contains("&nbsp;"

    )) Then
  53. toreturn = toreturn.Replace("&nbsp;"

    , " "

    )
  54. End

    If
  55. If

    (toreturn.Contains("&quot;"

    )) Then
  56. toreturn = toreturn.Replace("&quot;"

    , ""

    ""

    )
  57. End

    If
  58. Return toreturn
  59. Else
  60. Dim

    s2 As

    String

    = ""
  61. If

    (s2.Contains("&#39;"

    )) Then
  62. s2 = s2.Replace("&#39;"

    , "'"

    )
  63. End

    If
  64. If

    (s2.Contains("&nbsp;"

    )) Then
  65. s2 = s2.Replace("&nbsp;"

    , " "

    )
  66. End

    If
  67. If

    (s2.Contains("&quot;"

    )) Then
  68. s2 = s2.Replace("&quot;"

    , ""

    ""

    )
  69. End

    If
  70. Return s2
  71. End

    If
  72. End

    Function
  73. End

    Class

Note: Due to the size or complexity of this submission, the author has submitted it as a .zip file to shorten your download time. After downloading it, you will need a program like Winzip to decompress it.

Virus note: All files are scanned once-a-day by SourceCodester.com for viruses, but new viruses come out every day, so no prevention program can catch 100% of them.

FOR YOUR OWN SAFETY, PLEASE:

1. Re-scan downloaded files using your personal virus checker before using it.

2. NEVER, EVER run compiled files (.exe's, .ocx's, .dll's etc.)--only run source code.


Download
You must upgrade your account or reply in the thread to view hidden text.
 

452,292

323,517

323,526

Top