This month the T-SQL Tuesday invitation is from Jen&Sean McCown (B|T), asking us to talk about FileGroups or a related area. I thought FILESTREAMs was a good choice to talk about because it needs its own file group. Let’s get rolling then:
What is FILESTREAM?
FILESTREAM was a new SQL Server feature (in SQL2008) that lets you store unstructured BLOB data directly in the file system in a set of folders, access to which is provided via SQL*Server and a special file system driver.
- The files created as part of the FILESTREAM are managed by the SQL Server itself in their own file group which can be backed up and restored along with other SQL Server data.
- Reading and writing these files is part of the database transaction.
- FILESTREAM data can be stored only on the local server drive or a local mount point.
- Database snapshopts are not supported for the FILESTREAM containers (folders).
- Database Mirroring is not supported.
- Transparent Data Encryption does not encrypt the FILESTREAM data.
When do I use it?
“(…) objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors.”
So, the general recommendation is to use it when your BLOBs are greater than 1MB and you have additional logic that needs to be performed on the BLOB (like scanning the barcodes in the document before sending the document out) and fast read access is required. If all the fore mentioned conditions are not met, you are better off storing the BLOB in the database.
How does it work?
To put it very simply:
- Binary data stored as individual files outside the database.
- These individual binary files are accessed through WIN32 API for file operations.
- T-SQL will be used to query database to get file handle using PathName().
- T-SQL will be used to get current transaction token using GET_FILESTREAM_TRANSACTION_CONTEXT().
- Both the details are passed to ASP.Net application for read/write to files.
Wait, how did they do this before filestream?
The first method was to use the file system i.e. store the paths in the DB and the files on the NTFS shares. Some of the problems with approach were that the backup of the files is not always synchronized with the backup of the database, transactional consistency issues, support for FT Search is hard and finally creating a consistent DR plan was a nightmare.
The second method was to store the file into a BLOB, now called a VarBinary(max) where you have the benefits of the backups, the full text search but you are limited to 2GB per file. Operations on the database (backups and other maintenance tasks) take more time.
How do I set it up?
First things first, you need to enable FILESTREAM at windows level. You do this on the ‘FILESTREAM’ tab in the SQL*Server configuration manager for the SQL instances whose properties you want to change.
Then at the SQL*Server instance level: By running sp_configure with ‘filestream_access_level’ configuration option.
Then you create a new FileGroup for the FILESTREAM container. There is a 1:1 mapping between the FILESTREAM FileGroups and the FILESTREAM container. You can validate this by using the below query.
SELECT file_id AS fileid, type_desc AS filetype, name AS name, physical_name AS physicalname FROM sys.database_files WHERE type_desc = 'FILESTREAM' GO
CREATE DATABASE DiscountItems ON PRIMARY ( NAME = DiscountItemsData1, FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsData1.mdf'), FILEGROUP DiscountItemsDB2( NAME = DiscountItemsData2, FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsData2.ndf'), FILEGROUP DiscountItemsFS1 CONTAINS FILESTREAM DEFAULT( NAME = DiscountItemsFS1, FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsFS1'), LOG ON ( NAME = DiscountItemsLOG, FILENAME = 'D:\Sql\MSSQL\Data\DiscountItemsLOG.ldf') GO
Once the database has a FILESTREAM filegroup, tables can be created that contain FILESTREAM columns. As mentioned earlier, a FILESTREAM column is defined as a varbinary (max) column that has the FILESTREAM attribute.
CREATE TABLE [ForSaleItems]( [ItemID] [int] IDENTITY(1,1) PRIMARY KEY [ItemGuid] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE ON [PRIMARY], [ItemDescription] [varchar](50) NULL, [ItemImage] [varbinary](max) FILESTREAM NULL ) FILESTREAM_ON DiscountItemsFS1 GO
As a requirement, you need to have a ROWGUID column like the ItemID column. If you try to create a plain old Identity column, the table won’t be created and you will get an error (Msg 5505, Level 16, State 1, Line 1 – A table with FILESTREAM column(s) must have a non-NULL unique ROWGUID column).
The column that will be used as a pointer to the real file on the disk needs to be created as a VarBinary(max) with the FileStream attribute. In our case this is the ItemImage.
The following query will insert a new row into our table and it will also create a file into our folder.
INSERT INTO ForSaleItems([ItemGuid], [ItemDescription], [ItemImage]) VALUES(NEWID(), 'Something random somebody said', CAST('Hope is a heuristicSearch' AS VARBINARY(MAX)))
The path to the file is not immediately available, you need to use Pathname() function to get the path. But, remember that PathName() returns only a file handle so you cannot navigate to the path.
SELECT [ItemDescription], pathname = [ItemImage].PathName() FROM ForSaleItems
If you really want to see the files, you need to goto the FILESTREAM container (the folder we specified for FILESTREAM when creating the DB). Also, when you delete files that are no longer needed, they are removed by a garbage collection process (an automatic background process).
What about Backups?
All backup and recovery methods are available with FILESTREAM data. If you already have a backup process in place then remember that the FILESTREAM data is backed up with the structured data in the database.
If you do not want to back up FILESTREAM data with relational data, you can use a partial backup to exclude FILESTREAM filegroups.
So, hopefully this has given you a very quick introduction to FILESTREAMs in SQL*Server.
Most of the stuff I presented in the post came from the following WhitePaper (Or at least, what I understood from the paper): FILESTREAMStorage by Paul Randal.
If there are any mistakes, they are my own and should not be attributed to anyone else. Comments, suggestions and telling off is always welcome.