News

Most Network Data Sits Untouched

Statistically speaking, most data on enterprise networks rarely gets accessed after it is written to network storage, according to researchers from NetApp Inc. and the University of California (UC). Evidently, we are too busy writing new data to go back over old data.

Andrew Leung, a computer science researcher at the UC, presented the findings at the USENIX conference in Boston last week. Given those results, organizations might want to consider moving much of their data to slower but less expensive storage units since it rarely gets accessed, he said.

The team studied the traffic that flowed through NetApp's enterprise file servers, which manage more than 22T of material relating to all aspects of the company's business operations.

Leung said the study is the first large-scale examination of network traffic patterns. "How people have been deploying network file systems has been changing over the past five to 10 years," he said. "They are being used more commonly for different kinds of things. So what we would like to know is how this affects the workloads of the network."

During the three-month period that the network was under scrutiny, more than 90 percent of the material on the servers was never accessed. The researchers captured packets encoded using the Common Internet File System protocol, which Microsoft Windows applications use to save data via a network. About 1.5T of data was transferred.

"Compared to the full amount of allocated storage on the file servers, this represents only 10 percent of data," Leung said. "[This] means that 90 percent of the data is untouched during this three-month period."

Moreover, among the files that were opened, 65 percent were only opened once. And most of the rest were opened five or fewer times, though about a dozen files were open 100,000 times or more.

"What this suggests, in general, is that files are infrequently re-accessed," Leung said.

The team also observed that the ratio of data being read from storage versus the amount of data written to storage has changed from what had been seen in previous studies. Bytes written compared to bytes read by a ratio of about 2-1. "Past read-write ratios saw read-to-write ratios of 4-1 or higher," Leung added.

Developers of file systems might want to take into consideration the fact that their creations are spending almost as much time writing data as reading data. "The workloads are becoming more write-oriented, so the decrease in read-only traffic and the increase in write traffic suggests that file systems want to be more write-oriented," Leung said.

File server vendors also might want to consider re-jiggering their pre-fetching and caching algorithms to improve performance, given those findings. "If we know that files aren't frequently re-accessed, what this suggests is that [caching] algorithms may not be the best for network file systems" because the material cached will probably not get retrieved, he said.

Speaking to Government Computer News after the presentation, Leung described the 10 percent of data that was being re-accessed. Typically, it is in the file format most closely associated with the user's job. Architects might use computer-aided design files, while developers use source-code files. Also, files that are higher up in a file path or closer to the user's home directory tend to be accessed more often than those buried deeper down in a hierarchy of subfolders.

More than 75 percent of the files being opened were very small -- less than 20K each -- although another 12 percent were more than 5G each.

About the Author

Joab Jackson is the chief technology editor of Government Computing News (GCN.com).

comments powered by Disqus

Featured

  • VS Code v1.99 Is All About Copilot Chat AI, Including Agent Mode

    Agent Mode provides an autonomous editing experience where Copilot plans and executes tasks to fulfill requests. It determines relevant files, applies code changes, suggests terminal commands, and iterates to resolve issues, all while keeping users in control to review and confirm actions.

  • Windows Community Toolkit v8.2 Adds Native AOT Support

    Microsoft shipped Windows Community Toolkit v8.2, an incremental update to the open-source collection of helper functions and other resources designed to simplify the development of Windows applications. The main new feature is support for native ahead-of-time (AOT) compilation.

  • New 'Visual Studio Hub' 1-Stop-Shop for GitHub Copilot Resources, More

    Unsurprisingly, GitHub Copilot resources are front-and-center in Microsoft's new Visual Studio Hub, a one-stop-shop for all things concerning your favorite IDE.

  • Mastering Blazor Authentication and Authorization

    At the Visual Studio Live! @ Microsoft HQ developer conference set for August, Rockford Lhotka will explain the ins and outs of authentication across Blazor Server, WebAssembly, and .NET MAUI Hybrid apps, and show how to use identity and claims to customize application behavior through fine-grained authorization.

  • Linear Support Vector Regression from Scratch Using C# with Evolutionary Training

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the linear support vector regression (linear SVR) technique, where the goal is to predict a single numeric value. A linear SVR model uses an unusual error/loss function and cannot be trained using standard simple techniques, and so evolutionary optimization training is used.

Subscribe on YouTube