1

I'm writing something like a web crawler that it's engine follows these steps:

  1. Reading Rss Link(Argument)
  2. Defining a list(of) Rss Items
  3. Checking each link existence in database(SQL SERVER) by a separate query
  4. If the link was new one it will insert the fields to DB by a separate query

    Public Sub MyTickHandler()
        Dim NewItems As New List(Of Structures.RSSItem)
        Dim founded As Boolean = False
    
        NewItems = RssReader.ParseRssFile(RssURL)
    
        Dim connString = Configs.NewsDBConnection
        Dim myConnection As SqlConnection = New SqlConnection("Server=localhost;Database=db;Integrated Security=SSPI;;Connection Timeout=45;Max Pool Size= 300")
        myConnection.Open()
    
        For Each item In NewItems
            Dim cmdString As String = "SELECT id FROM posts  with (nolock) WHERE link LIKE '" & item.link.Trim.ToLower & "'"
            Dim TheCommand As SqlCommand = New SqlCommand(cmdString, myConnection)
            Dim result = TheCommand.ExecuteScalar()
            If result Is Nothing Then
                TheCommand = New SqlCommand("INSERT INTO posts (link) VALUES ('" & item.link.ToLower.Trim & "')")
                TheCommand.Connection = myConnection
                TheCommand.ExecuteNonQuery()
    
                TheCommand = New SqlCommand("INSERT INTO queue (link,descrip,site,title,category) VALUES ('" & item.link.ToLower.Trim & "','" & StringToBase64(item.description) & "','" & RssSite & "','" & StringToBase64(item.title) & "','" & RssCategory & "')")
                TheCommand.Connection = myConnection
                TheCommand.ExecuteNonQuery()
            End If
            TheCommand.Dispose()
        Next
    
        myConnection.Close()
        myConnection.Dispose()
        SqlConnection.ClearPool(myConnection)
    
     End Sub
    

This works perfect for single calling.
but I have something about 150 Rss links and I should Check each of them every 2 minutes by threading, so by increasing the mount of SQL Queries, this process and also sql server won't response and application crashes!!

I tried some tips like increasing sql server response timeout, but it didn't help at all.

Any better way or tips for this process?
Thanks

2
  • Could be your LIKE query that's killing performance. Run a SQL trace to find out. If that's not it, run the app under a profiler to find out where it spends its time. Commented Feb 7, 2015 at 17:33
  • @500-InternalServerError I tried profiling before on a single processing and all the commands were normal but actually the profilers wouldn't let the process to work in the normal way with 150 high processing threads! Thanks Commented Feb 7, 2015 at 17:53

2 Answers 2

1
  • Only do one single fetch, outside the for-each-loop:

SELECT id, link FROM posts with (nolock) WHERE link in (@listOfLowerCaseLinks)

Dim myListOfLinks As New List(Of String)
...
TheCommand.Parameters.AddWithValue("@listOfLowerCaseLinks", myListOfLinks)
  • Wrap the whole action of inserts (the whole for-each-loop) into a sql transaction. That way, the database won't have to commit in-between.
Sign up to request clarification or add additional context in comments.

7 Comments

I would test sql transaction on inserts. Thank you ;)
The bulk fetch is also very important. It reduces the number of fetch queries every two minutes from 150 to 1. Provided that your link column is indexed, the majority of time will be spent roundtripping to/from the sql server, rather than the data itself. A speedup of almost 150x faster is to be expected
yeah! I didn't notice that.. It will help me a lot! but about transaction I don't have enough experience about transaction, and I know it will lock other threads access for a period of time, is it correct to use it within concurrency?
Check for an example of transactions usage - stackoverflow.com/a/298253/95976
If you don't use a transaction scope, SQL Server still creates a transaction per every query. So, locking happens anyway. By grouping queries into a single transaction, you help SQL Server do one single flush to the database file. Granted, try to keep the transaction scope short. In other words, do all the heavy calculations first, then once you have all the data in lists, have only your insert commands within the transaction scope. Also, SQL Server doesn't like transactions with loads of CUD behaviour (let's say, more than 1000 inserts/updates/deletes in a single transaction).
|
1

I suggest you pass a table-valued parameter to a stored procedure for this task. That will allow the entire list to be inserted in a single call. Below is an example you can tweak for your actual column lengths. It is important to have an index on the link column of the posts table. I assume link is unique in this example.

T-SQL to create table type and proc:

CREATE TYPE dbo.linkInfo AS TABLE(
     link varchar(255) NOT NULL PRIMARY KEY
    ,descrip varchar(255)
    ,title varchar(255)
    );
GO

ALTER PROC dbo.usp_InsertRssItems
     @site varchar(255)
    ,@category varchar(255)
    ,@linkInfo dbo.linkInfo READONLY
AS

SET NOCOUNT ON;

DECLARE @InsertedPosts TABLE(link varchar(255));

INSERT INTO dbo.posts(link)
OUTPUT inserted.link INTO @InsertedPosts
SELECT link
FROM @linkInfo AS li
WHERE NOT EXISTS(
    SELECT *
    FROM dbo.posts AS p
    WHERE p.link = li.link
    );

INSERT INTO dbo.queue(link,descrip,site,title,category)
SELECT li.link, li.descrip, @site,li. title, @category
FROM @linkInfo AS li
WHERE EXISTS(
    SELECT *
    FROM @InsertedPosts AS ip
    WHERE ip.link = li.link
    );
GO

Sample VB.NET code:

Sub MyTickHandler()

    Dim NewItems As New List(Of Structures.RssItem)
    Dim founded As Boolean = False

    NewItems = RssReader.ParseRssFile(RssURL)

    Dim dt = getNewRssItemDataTable(NewItems)

    Dim connString = Configs.NewsDBConnection
    Dim myConnection As SqlConnection = New SqlConnection("Server=localhost;Database=db;Integrated Security=SSPI;;Connection Timeout=45;Max Pool Size= 300")
    Dim TheCommand As SqlCommand = New SqlCommand("dbo.usp_InsertRssItems", myConnection)
    TheCommand.Parameters.Add(New SqlParameter("@site", SqlDbType.VarChar, 255)).Value = "z"
    TheCommand.Parameters.Add(New SqlParameter("@category", SqlDbType.VarChar, 255)).Value = "z"
    TheCommand.Parameters.Add(New SqlParameter("@linkInfo", SqlDbType.Structured)).Value = dt
    TheCommand.CommandType = CommandType.StoredProcedure

    myConnection.Open()
    TheCommand.ExecuteNonQuery()

    myConnection.Close()
    myConnection.Dispose()

End Sub

Private Function getNewRssItemDataTable(NewRssItems As List(Of Structures.RssItem)) As DataTable

    Dim dt As New DataTable
    dt.Columns.Add("link", GetType(String)).MaxLength = 255
    dt.Columns.Add("descrip", GetType(String)).MaxLength = 255
    dt.Columns.Add("title", GetType(String)).MaxLength = 255

    For Each NewRssItem In NewRssItems
        Dim row = dt.NewRow
        dt.Rows.Add(row)
        row(0) = NewRssItem.link
        row(1) = NewRssItem.description
        row(2) = NewRssItem.title

    Next NewRssItem

    Return dt

End Function

EDIT:

I see you mentioned you would like a SqlBulkCopy example. If inserts are unconditional, you can use this technique:

Sub executeBulkInsert(connectionString As String, site As String, category As String, NewRssItems As List(Of Structures.RssItem))

    Dim dt As New DataTable

    dt.Columns.Add("link", GetType(String)).MaxLength = 255
    dt.Columns.Add("descrip", GetType(String)).MaxLength = 255
    dt.Columns.Add("site", GetType(String)).MaxLength = 255
    dt.Columns.Add("title", GetType(String)).MaxLength = 255
    dt.Columns.Add("category", GetType(String)).MaxLength = 255

    For Each NewRssItem In NewRssItems
        Dim row = dt.NewRow
        dt.Rows.Add(row)
        row(0) = site
        row(1) = category
        row(2) = NewRssItem.link
        row(3) = NewRssItem.description
        row(4) = NewRssItem.title

    Next NewRssItem

    Dim bcp = New SqlBulkCopy(connectionString)
    bcp.DestinationTableName = "dbo.queue"

    bcp.WriteToServer(dt)

End Sub

3 Comments

Thanks for your solution and +1. As @taoufik said Bulk insert(by SQLBulkCopy) Is the way for a single inserting without needing to have something like "dbo.usp_InsertRssItems". Now I'm testing my code with SQLBulkCopy. So if you know some better way or tips of SQLBulkCopy I'd thank you so much to describe..
Table-valued parameters are bulk copied into a temp table in tempdb by SqlClient. SqlBulkCopy is also very good for performance but does not allow conditional inserts.
In this case I don't need conditional inserts, so I think SqlBulkCopy for inserts is the best choice. but for bulking Select statements I'm not sure that the way @taoufik described is the best performance one or not! do you know a good way for bulking select statements with only different argument values?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.