T-SQL Tuesday #040: File and Filegroup Wisdom (Quick Introduction to FILESTREAM)


TSql2sDay

T-SQL Tuesday #040: hosted by MidNightDBA

This month the T-SQL Tuesday invitation is from Jen&Sean McCown (B|T), asking us to talk about FileGroups or a related area. I thought FILESTREAMs was a good choice to talk about because it needs its own file group. Let’s get rolling then:

What is FILESTREAM?

FILESTREAM was a new SQL Server feature (in SQL2008) that lets you store unstructured BLOB data directly in the file system in a set of folders, access to which is provided via SQL*Server and a special file system driver.

Thumbs-Up:

  • The files created as part of the FILESTREAM are managed by the SQL Server itself in their own file group which can be backed up and restored along with other SQL Server data.
  • Reading and writing these files is part of the database transaction.

Bleh!:

  • FILESTREAM data can be stored only on the local server drive or a local mount point.
  • Database snapshopts are not supported for the FILESTREAM containers (folders).
  • Database Mirroring is not supported.
  • Transparent Data Encryption does not encrypt the FILESTREAM data.

When do I use it?

“(…) objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors.”[1]

So, the general recommendation is to use it when your BLOBs are greater than 1MB and you have additional logic that needs to be performed on the BLOB (like scanning the barcodes in the document before sending the document out) and fast read access is required. If all the fore mentioned conditions are not met, you are better off storing the BLOB in the database.

How does it work?

To put it very simply:

  • Binary data stored as individual files outside the database.
  • These individual binary files are accessed through WIN32 API for file operations.
  • T-SQL will be used to query database to get file handle using PathName().
  • T-SQL will be used to get current transaction token using GET_FILESTREAM_TRANSACTION_CONTEXT().
  • Both the details are passed to ASP.Net application for read/write to files.

Wait, how did they do this before filestream?

The first method was to use the file system i.e. store the paths in the DB and the files on the NTFS shares. Some of the problems with approach were that the backup of the files is not always synchronized with the backup of the database, transactional consistency issues, support for FT Search is hard and finally creating a consistent DR plan was a nightmare.

The second method was to store the file into a BLOB, now called a VarBinary(max) where you have the benefits of the backups, the full text search but you are limited to 2GB per file. Operations on the database (backups and other maintenance tasks) take more time.

How do I set it up?

First things first, you need to enable FILESTREAM at windows level. You do this on the ‘FILESTREAM’ tab in the SQL*Server configuration manager for the SQL instances whose properties you want to change.

Then at the SQL*Server instance level: By running sp_configure with ‘filestream_access_level’ configuration option.

Then you create a new FileGroup for the FILESTREAM container. There is a 1:1 mapping between the FILESTREAM FileGroups and the FILESTREAM container. You can validate this by using the below query.

SELECT file_id AS fileid,
type_desc AS filetype,
name AS name,
physical_name AS physicalname
FROM sys.database_files
WHERE type_desc = 'FILESTREAM'
GO

Example:

CREATE DATABASE DiscountItems ON
PRIMARY (
NAME = DiscountItemsData1,
FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsData1.mdf'),
FILEGROUP DiscountItemsDB2(
NAME = DiscountItemsData2,
FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsData2.ndf'),
FILEGROUP DiscountItemsFS1 CONTAINS FILESTREAM DEFAULT(
NAME = DiscountItemsFS1,
FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsFS1'),
LOG ON (
NAME = DiscountItemsLOG,
FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsLOG.ldf')
GO

Once the database has a FILESTREAM filegroup, tables can be created that contain FILESTREAM columns. As mentioned earlier, a FILESTREAM column is defined as a varbinary (max) column that has the FILESTREAM attribute.[2]

Example:

CREATE TABLE [ForSaleItems](
[ItemID] [int] IDENTITY(1,1) PRIMARY KEY
[ItemGuid] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE ON [PRIMARY],
[ItemDescription] [varchar](50) NULL,
[ItemImage] [varbinary](max) FILESTREAM NULL
)
FILESTREAM_ON DiscountItemsFS1
GO

As a requirement, you need to have a ROWGUID column like the ItemID column. If you try to create a plain old Identity column, the table won’t be created and you will get an error (Msg 5505, Level 16, State 1, Line 1 – A table with FILESTREAM column(s) must have a non-NULL unique ROWGUID column).

The column that will be used as a pointer to the real file on the disk needs to be created as a VarBinary(max) with the FileStream attribute. In our case this is the ItemImage.

The following query will insert a new row into our table and it will also create a file into our folder.

INSERT INTO ForSaleItems([ItemGuid], [ItemDescription],  [ItemImage])
VALUES(NEWID(), 'Something random somebody said', CAST('Hope is a heuristicSearch' AS VARBINARY(MAX)))

The path to the file is not immediately available, you need to use Pathname() function to get the path. But, remember that PathName() returns only a file handle so you cannot navigate to the path.

SELECT [ItemDescription], pathname = [ItemImage].PathName() FROM ForSaleItems

If you really want to see the files, you need to goto the FILESTREAM container (the folder we specified for FILESTREAM when creating the DB). Also, when you delete files that are no longer needed, they are removed by a garbage collection process (an automatic background process).

What about Backups?

All backup and recovery methods are available with FILESTREAM data. If you already have a backup process in place then remember that the FILESTREAM data is backed up with the structured data in the database.
If you do not want to back up FILESTREAM data with relational data, you can use a partial backup to exclude FILESTREAM filegroups.

Conclusion:

So, hopefully this has given you a very quick introduction to FILESTREAMs in SQL*Server.

Most of the stuff I presented in the post came from the following WhitePaper (Or at least, what I understood from the paper): FILESTREAMStorage by Paul Randal.

If there are any mistakes, they are my own and should not be attributed to anyone else. Comments, suggestions and telling off is always welcome.

About these ads
About

By profession, I’m a SQL Server Database Administrator. I love to poke my nose into different corners and see how stuff looks in there. I keep looking for new things to do as my mind refuses to settle on one topic.

Tagged with: , , , ,
Posted in SQL Server
3 comments on “T-SQL Tuesday #040: File and Filegroup Wisdom (Quick Introduction to FILESTREAM)
  1. rieshellin says:

    Very detailed post…Thanks a ton

  2. […] Hemanth Damecharla: T-SQL Tuesday #040: File and Filegroup Wisdom (Quick Introduction to FILESTREAM) […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 88 other followers

%d bloggers like this: