The .NET Framework 4.5 Gets ZIP
On VB columnist Joe Kunk shows you how to create and extract "almost" .ZIP files in .NET Framework 4 and standard .ZIP files in .NET Framework 4.5.
The ability to compress multiple files from one or more directories into a single, smaller file is useful for reducing storage and data bandwidth requirements, creating backups, archiving multiple versions of files and detecting file modifications. Unfortunately, working with the commonly-used public domain .ZIP File Format has never been easy in Microsoft.NET. Enhancements to .NET Framework 4.5 finally make working with ZIP files easier.
Microsoft lists "ZIP compression improvements to reduce the size of a compressed file" as one of the .NET Framework 4.5 core new features in the MSDN Library article, "What's New in the .NET Framework 4.5 Beta," but the enhancements go beyond a better compression algorithm.
In this article, I'll show you how to implement ZIP compression in .NET Framework 4 for a folder and its contents (including sub-folders), and then I'll demonstrate an easier method to do the same with standard .ZIP files in .NET Framework 4.5.
The sample code is based on the.NET Framework 4.5 Beta. Microsoft has issued a Go Live license for the Visual Studio 11 Beta and .NET Framework 4.5 Beta, which assures developers that there will be no changes in the API compared to the release version.
A Short History
The ZIP file compression technology was developed by the late Phil Katz in the 1980s and is still distributed by the PKWare company he founded. The .ZIP File Format specification has been available since its inception; Phil Katz published the original .ZIP File Format Specification as a documentation file in early versions of PKWare, and his company later posted it online. The current version of the specification is available here. This openness helped quickly establish the .ZIP format as a popular and widely used compressed file format. It has been available in Windows as compressed folders since 1998. The ability to interact with standard .ZIP files is clearly a desirable and useful capability for Microsoft .NET developers.
The namespace System.IO.Compression was introduced in .NET Framework 2.0 with the GZipStream (.GZ extension) class to compress and decompress streams. It's not compatible with the .ZIP File Format because it lacks support for the ZIP file headers. The System.IO.Packaging.ZIPPackage namespace was added in .NET Framework 3.0, and it's compatible with ZIP files with a major exception.
ZIP Files (Almost)
The ZipViaFramework40 project of the code download demonstrates use of the ZipPackage class to create a ZIP compatible file of a folder with all its files and subfolders. When you extract the resulting file, you'll notice an extra file in the root folder named [Content_Types].xml:
<?xml version="1.0" encoding="utf-8"?>
<Default Extension="pdf" ContentType="application/zip" />
<Default Extension="png" ContentType="application/zip" />
<Default Extension="jpg" ContentType="application/zip" />
<Default Extension="txt" ContentType="application/zip" />
<Default Extension="docx" ContentType="application/zip" />
<Default Extension="snippet" ContentType="application/zip" />
This file lists the extensions included within the .ZIP file. When uncompressing files with the ZipPackage class, [Content_Types].xml must be present in the ZIP and must contain the file's extension in order for the file to be extracted. Any files with other file extensions, such as a .GIF file, would not be extracted. Since virtually all .ZIP files lack the [Content_Types].xml file unless prepared with the ZipPackage class, it's for all practical purposes, unable to extract standard .ZIP files. This is the issue referred to when others have said that the .NET Framework does not support .ZIP files.
If you have an internal application that produces .ZIP files using ZipPackage and the [Content_Types].xml file requirement is not an issue, then what's required to compress and extract a directory with all its contents? The code in Listing 1 compresses a folder and its contents. It's interesting to note that there are no folders in a .ZIP file, simply a collection of compressed files with URIs that contain relative folder paths that are honored at extraction, i.e., "/temp/sample.txt." The Optimization parameter represents the desired System.IO.Packaging.CompressionOption enumeration value of NotCompressed, Normal, Maximum, Fast and SuperFast. A reference to WindowsBase.dll is required to access the System.IO.Packaging namespace.
Extracting the compressed file is shown in Listing 2. It assumes that the extraction folder will not contain any identically named files to those being extracted.
Figure 1 shows the expanded System.IO.Compression namespace in .NET Framework 4.5. The new features are highlighted in yellow.
[Click on image for larger view.]
|Figure 1. Improvements in the System.IO.Compression namespace|
While the code in Listing 1 and Listing 2 is not difficult, it does involve working with streams and byte arrays and is non-trivial. By comparison, the code to compress a folder and its contents with the ZIPFile class in .NET Framework 4.5 Beta is a single line using static methods of the ZIPFile class:
FolderPath, ZipFullFilename, Optimization, includeBaseDirectory:=False)
The code to extract a ZIP file is also a single line in which Optimization is the desired value of the System.IO.Compression.CompressionLevel enumeration of Optimal, Fastest or NoCompression:
A reference to the System.IO.Compression.FileSystem assembly is required to use this namespace. This assembly is not available for Metro-style applications; they must use the ZipArchive, ZipArchiveEntry, DeflateStream and GZipStream classes.
The code download contains a solution with three projects.
The ZipViaFramework40 project contains a Windows Forms application that can compress a folder with its contents, extract a ZIP file to its files and folders, and display the directory metadata available for each ZIP file (assuming it has a proper [Content_Types].xml file). The functional code is contained in the zipfileclass.vb file for easy re-use. Many organizations will not implement .NET Framework 4.5 immediately upon its release, so this project targets the .NET Framework 4.
The ZipViaFramework45 project performs the same functions as ZipViaFramework40, and it too, has its (much smaller) functional code in the zipfileclass.vb file for easy re-use. This project targets the .NET Framework 4.5 Beta.
The CommonUtilities project contains shared code across both projects. It targets the .NET Framework 4.5 Beta.
The download file has a "temp" directory that contains a mix of sample files for the compression routines to use. Figure 2 shows the two demo forms available in the code download.
[Click on image for larger view.]
|Figure 2. Demo forms available in the code download|
The lack of fully compatible ZIP file classes in the .NET Framework has encouraged alternate libraries. The dynamic language IronPython has a ZIPfile library. The library SharpZipLib has been released under the GPL with the GNU Classpath exceptions, allowing it to be used in commercial closed-source applications. The full-featured DotNetZip library is available for free on CodePlex under the Ms-PL license. Developers that prefer a commercial product, may want to consider Xceed Zip for .NET.
The code download shows you how to create and extract "almost" .ZIP files in .NET Framework 4 and standard .ZIP files in .NET Framework 4.5. Using alternative libraries may mean more features, but for those looking for simple processing of standard .ZIP files, it's now available in .NET Framework.
Joe Kunk is a Microsoft MVP in Visual Basic, three-time president of the Greater Lansing User Group for .NET, and developer for Dart Container Corporation of Mason, Michigan. He's been developing software for over 30 years and has worked in the education, government, financial and manufacturing industries. Kunk's co-authored the book "Professional DevExpress ASP.NET Controls" (Wrox Programmer to Programmer, 2009). He can be reached via email at firstname.lastname@example.org.