Code Focused

How to Get Oriented with a New Database

On VB author Joe Kunk shows how you can quickly discover the orientation of an unknown database using dynamic queries.

As a consultant, I am often asked to produce an application or report that utilizes a SQL Server database that I am completely unfamiliar with. On a good day I am fortunate enough to be provided some documentation regarding the database, ideally an entity relationship diagram and a data dictionary. Most often I am provided an administrator login and left to discover the rest for myself. Sound familiar to anyone?

In this article, I present a sample program to perform dynamic queries against a SQL Server 2008 database to provide an orientation to the database and guide further investigation of its characteristics. The sample program uses SQL Server 2008 Express and queries the AdventureWorks2008 sample database available here. I use the term "SQL" to refer to Transact SQL (T-SQL) statements that can be executed on a SQL Server database.

Dynamic Queries
Almost all my local database interaction involves some form of database interface utility with my current favorite being LINQ-to-SQL. The convenience of strongly typed tables and columns, the security of generated SQL statements, and the brevity of code required to interact with the database is hard to beat. These database interfaces assume a prior knowledge of the database and are not optimized for free-form queries. When raw SQL is needed, I fall back to an ADO.NET library named LibADO that I wrote many years ago when ADO.NET still had its just-released showroom shine.

The LibADO function QuerySingleResult takes three parameters: the database connection string, the SQL statement to be executed, and a new DataTable instance. It returns the integer number of rows placed in the DataTable and an updated DataTable parameter (ByRef) with the rows of the SQL query result. Columns are automatically created based on the result column's data type and name. With this, a dynamic SQL query is as simple as supplying a database connection string, the SQL statement, and a new DataTable to hold the results. Simply iterate through the DataTable to process the results, or bind it to any control that supports a DataTable as a data source. The code for QuerySingleResult is shown in Listing 1.

        Public Function QuerySingleResult(ByVal strconn As String, 
ByVal strSQL As String, ByRef dt As DataTable) As Integer
'Returns a DataTable filled with the results of the query
'Function returns the count of records in the datatable
'----- dt (datatable) needs to be empty & no schema defined

'Verify the input parameter of connection string
If (strconn.Trim.Length = 0) Then
If strConnection.Length > 0 Then strconn = strConnection
If strconn.Length = 0 Then Return 0
End If

If (strSQL.Trim.Length = 0) Then Return 0

Dim sconQuery As New SqlConnection
Dim scmdQuery As New SqlCommand
Dim srdrQuery As SqlDataReader = Nothing
Dim intRowsCount As Integer = 0

Try
'Open the SQL connnection to the database
sconQuery.ConnectionString = strconn
sconQuery.Open()

'Execute the SQL command against the database & return a resultset
scmdQuery.Connection = sconQuery
scmdQuery.CommandText = strSQL
srdrQuery = scmdQuery.ExecuteReader(Data.CommandBehavior.SingleResult)
dt.Clear() 'Clears all data from the datatable

If srdrQuery.HasRows Then

'Determine if the incoming datatable has an existing schema or not
Dim dtSchema As New Data.DataTable
dtSchema = srdrQuery.GetSchemaTable
If dt.Columns.Count = 0 Then
For i As Integer = 0 To dtSchema.Rows.Count - 1
Dim c As New DataColumn
c.ColumnName = dtSchema.Rows(i).Item("ColumnName").ToString
c.DataType = CType(dtSchema.Rows(i).Item("DataType"), System.Type)
If c.DataType Is GetType(String) Then
c.MaxLength = CType(dtSchema.Rows(i).Item("ColumnSize"), Int32)
End If
Try
dt.Columns.Add(c)
Catch
'Prevent abort of program if column name is a duplicate
'Append a "2" to the column name if a duplicate
c.ColumnName = c.ColumnName & "_" & i.ToString
End Try
Next
End If
Do While srdrQuery.Read
intRowsCount += 1
Dim dr As DataRow
dr = dt.NewRow
For i As Integer = 0 To dt.Columns.Count - 1
dr.Item(i) = srdrQuery.Item(i)
Next i
dt.Rows.Add(dr)
Loop

'Return zero rows of data if only nulls returned in a single row result
If (intRowsCount = 1) Then
Dim blnNonNull As Boolean = False
For i As Integer = 0 To dt.Columns.Count - 1
If Not IsDBNull(dt.Rows(0).Item(i)) Then blnNonNull = True : Exit For
Next
If Not blnNonNull Then
dt.Reset() 'Reset the data table
Return 0
End If
End If
End If
Catch ex As Exception
Err.Raise(-1000, , "MasterLibrary:LibAdo:QuerySingleResult Error = '" & _
ex.Message & " ': sql = " & strSQL)
Finally
If Not IsNothing(srdrQuery) Then _
If Not srdrQuery.IsClosed Then srdrQuery.Close()
scmdQuery.Dispose()
sconQuery.Close()
sconQuery.Dispose()
End Try
Return intRowsCount
End Function

Listing 1. QuerySingleResult function to perform dynamic SQL queries

The danger with dynamic queries, including those processed by QuerySingleResult, is that they will happily execute whatever SQL statements you provide, even if the SQL statement is not what you intended it to be. This is how attacks such as SQL injection and cross-site scripting can be initiated. In this article we are using dynamic queries that do not involve user input so the risk is negligible. If you choose to use dynamic queries with user input, be sure to check the user input carefully before including it in your SQL statement. Since the semicolon character is the statement separator in both Transact SQL and JavaScript, prohibiting that character in user input can assist in providing database security.

Obtaining SQL Server metadata
Now that we can easily perform dynamic queries against a database with an unknown structure, the first question when looking at a new database is to get a good idea of the table names and row counts and how many columns are in each table. I have encountered some databases with a very large number of tables only to find that most of the tables have no data and few columns. A SQL query that partially produces the desired information is shown in Listing 2.

        SELECT name AS object_name,SCHEMA_NAME(schema_id) AS schema_name
,0 as RowsCount, 0 as ColumnCount
,create_date, modify_date, object_id
FROM sys.objects
WHERE type_desc = 'USER_TABLE'

Listing 2. Transact SQL query to return basic table information.

The query above does not give you any row counts or column counts for the listed tables, although it does reserve a spot for those counts in the resulting DataTable by including the zero valued RowsCount and ColumnCount columns in the query.

To provide those counts, we execute a dynamic query against each of the tables that counts the number of rows and updates the DataTable that was returned by the QuerySingleResult function. Similarly, we execute a dynamic query for each table to get the column count using the LibADO. Query_Integer() function and update the DataTable. That result can then be the data source for a display grid with sortable columns as shown in the "Tables" tab of the sample application. The full source code for this process is shown in Listing 3.

        Dim dt As New DataTable
Dim sql As String =
"SELECT name AS object_name " &
" ,SCHEMA_NAME(schema_id) AS schema_name " &
" ,0 as RowsCount " &
" ,0 as ColumnCount " &
" ,create_date " &
" ,modify_date " &
" ,object_id " &
"FROM sys.objects " &
"WHERE type_desc = 'USER_TABLE' " &
"ORDER BY name "

Dim RowCount As Integer = libADO.QuerySingleResult(My.Settings.DBConnStr, sql, dt)
If (Not dt Is Nothing AndAlso RowCount > 0) Then
For i As Integer = 0 To dt.Rows.Count - 1
Dim RowsSql = String.Format(
"SELECT Count(*) FROM [{0}].[{1}]", dt.Rows(i).Item(1), dt.Rows(i).Item(0))
Dim RowsCount As Integer = libADO.Query_Integer(RowsSql, , -1)
dt.Rows(i).Item(2) = RowsCount
Dim ColsSQL = String.Format(
"SELECT Count(*) FROM sys.columns WHERE object_id={0}", dt.Rows(i).Item(6))
Dim ColsCount As Integer = libADO.Query_Integer(ColsSQL, , -1)
dt.Rows(i).Item(3) = ColsCount
Next
Me.dgvTables.DataSource = dt
End If

Listing 3. Visual Basic code to list SQL user tables including row and column counts

Now that we have tables and their row and column counts, the next thing is to get the column information for each table, and if available, the description property for each column. See Listing 4.

Including a description property in the column definition screen in SQL Management Studio is very useful, but it annoys me that there is not an equivalent description field for tables. The purpose of the table can and should be documented in a custom extended property named Description, but I rarely see database designers make that effort.

A list of columns in the database user tables, including the Description field, is created by the code in Listing 4.

        Dim dt As New DataTable
Dim sql As String =
"SELECT " &
" SCHEMA_NAME(ST.schema_id) AS [Schema Name] " &
" ,Object_name(COL.Object_ID) AS [Table Name] " &
" ,COL.name as [Column Name] " &
" ,ST.name as [Type Name] " &
" ,ST.max_length as [Max Length] " &
" ,COL.precision as Precision " &
" ,COL.scale as Scale " &
" ,EP.value as [Description] " &
" ,ST.is_user_defined as [Is User Defined] " &
" ,ST.is_assembly_type as [Is Assembly Type] " &
"FROM sys.all_columns COL " &
" INNER JOIN sys.types ST on ST.user_type_id = COL.user_type_id " &
" INNER JOIN sys.objects OBJ on OBJ.object_id = COL.object_id " &
" LEFT OUTER JOIN sys.extended_properties EP on EP.major_id = COL.object_id " &
" AND EP.minor_id = COL.column_id " &
" AND EP.name = 'MS_Description' " &
"WHERE " &
" OBJ.type_desc = 'USER_TABLE' AND " &
" (OBJECTPROPERTY(COL.object_id,'IsMsShipped') = 0) " &
"ORDER BY " &
" SCHEMA_NAME(ST.schema_id) " &
" ,OBJECT_NAME(COL.Object_id) " &
" ,COL.name "

Dim RowCount As Integer = libADO.QuerySingleResult(My.Settings.DBConnStr, sql, dt)
Me.dgvColumns.DataSource = dt

Listing 4. Visual Basic code to list SQL columns for user tables including descriptions

Next it is useful to see a list of the other database objects such as views, stored procedures, table valued functions, scalar valued functions, users and roles. These are retrieved using SQL statements similar to what we have already seen and some can be seen in the sample program.

Conclusion
In this article we saw a way to quickly get oriented to the structure and data volume in a newly encountered database. There are commercial applications that provide far more detailed views of a SQL Server database schema and its data, but the techniques in this article provide a good first look.

The LibADO function QuerySingleResult provides an easy way to issue and consume free-form SQL queries, especially useful where prior knowledge of the needed database structures is limited or unavailable.

If you have other techniques for quickly getting oriented to an unfamiliar database, please share them with me and the other readers by commenting on this article on the VisualStudioMagazine.com Web site.

About the Author

Joe Kunk is a Microsoft MVP in Visual Basic, three-time president of the Greater Lansing User Group for .NET, and developer for Dart Container Corporation of Mason, Michigan. He's been developing software for over 30 years and has worked in the education, government, financial and manufacturing industries. Kunk's co-authored the book "Professional DevExpress ASP.NET Controls" (Wrox Programmer to Programmer, 2009). He can be reached via email at [email protected].

comments powered by Disqus

Featured

Subscribe on YouTube