Use Perl scripts to reclaim lost disk space and improve user navigation
In your corporate environment, do you view an increase in disk-space consumption as a positive or negative occurrence? The general IT philosophy is to blindly restrict disk growth and slap the wrists of users or departments that use too much disk space. My company frequently uses disk quotas, for example, to limit disk utilization. Sometimes, we even use Microsoft Excel charts at management meetings to expose the identities of "offenders." Perhaps such negative reinforcement is misguided.
Of course, I'm not suggesting that you let space utilization run rampant or that you throw your disk quotas out the window. Although quotas can discourage healthy disk-utilization growth, they're essential to ensure that users don't accidentallyor even maliciouslyfill up disks with garbage, thereby negatively affecting other departments and users. (For example, you wouldn't want a user to copy his or her entire C folder to centralized storage.)
If your company is experiencing increased disk-space requirements, you can be sure that your users trust your file-server disk resources. They have confidence that their files are safe, virus-free, backed up, and available when they need them. In problem environments, disk-space growth is seldom a challenge. If users sense dependability and availability problems, they simply store their work locally. Local storage results in frequent user requests for larger hard disksdriven by the fear of placing work on the file server that might not be available when users need it.
Making users feel guilty about using server storage can be dangerous. You wouldn't want your users and departments to circumvent established quotas by hosting rogue file servers or saving valuable data on desktop PCs that have no disk redundancy. Typically, these activities go undetected until disaster strikes and the secret storage area comes to light. The cost of purchasing additional storage is always less than the cost of data loss, data recovery, user downtime, and loss of productivity.
Your real enemy isn't necessarily increased disk-space utilization but rather file and folder clutter. Most disk-utilization growth occurs as a result of users uploading real business data that they access regularly. Obviously, we need to encourage this kind of healthy growth. Parallel to the accumulation of useful data, however, is the accretion of nonbusiness data and other file clutter.
File and folder clutter can make browsing file-share resources frustrating for all users. Additionally, detecting and deleting unproductive files is difficult. Although some file types clearly aren't business-related (e.g., personal music files), other personal files might not be so easy to identify. For example, locating and removing .jpeg files of a user's personal vacation is a tough task if those files are scattered among legitimate business-related .jpeg files. Here are five scripts that you can use to control file and folder clutter.
1. Search by Extension
Your first step is to root out any file types that are clearly inappropriate for server storage. A handful of user folders that each contains 500MB to 1GB worth of MP3 files, for example, can quickly eat up server storage space. MP3 files are easy to detect based on the file extension (.mp3), and as long as you have no business-oriented MP3s residing in storage, you can automate their deletion.
Table 1, page 53, lists file types that can potentially waste storage space. Of course, some of these file types can hold appropriate business-related content, so you need to carefully review your business policies and the needs of your user community before you start globally deleting files.
A quick note about user-circulated game and video files: After these files make their debut through email, download, or floppy disk, they can quickly spread throughout the office and eventually invade server storage. These kinds of files seem to take on a life of their own as users copy, rename, and circulate them. Regular searches for .exe files can help cut their life cycle short.
For more common file types that you might need to include in your customized list, go to http://www.chatnetdesign.com/task/filetypes.htm. The file types that you search for will probably change as new extensions become available.
You can use scripts to automate the detection and deletion of these files' types. (For more information about using scripts, see the Web-exclusive sidebar "Getting Started with Scripting," http://www.win2000mag.com, InstantDoc ID 22035.) For a simple script that deletes MP3 files residing in your D:\test folder, see the single-extension search-and-delete Perl script DeleteMP3.pl, which Listing 1 shows. The code looks complicated because of its comments and logging functions, but it's quite simple. The Unlink line deletes the .mp3 files. However, I've commented out the Unlink line and added a Print line so that the script will show you the files instead of deleting them. You could use the Windows NT shell commands Del or Erase to accomplish the deletions, but Perl creates a cleaner log file and is very fast. Perl's performance is evident in its lower CPU utilization and its speed when dealing with large numbers of files. If you're unfamiliar with Perl and could use some tips, see the Web-exclusive sidebar "Script Pseudo-Coding," http://www.win2000mag.com, InstantDoc ID 22036, for a closer look at DeleteMP3.pl's Recurse routine.
You can modify the code to look for multiple file types. The multiextension search-and-delete DeleteMultiExt.pl script, which Listing 2 shows, also comments out the Unlink command and includes a Print statement that will show a list of the files that match the search criteria. The modified script will search for any file with an .asf, .asx, .ra, .ram, or .rm extension. To add file types, you can simply chain additional OR code sections, such as
|\.exe$/i
Always test run your file-deletion and folder-removal scripts and comment out Unlink or Rmdir to observe the results before you attempt a production run. Schedule your scripts to run periodically, and review the logs of the deletion runs. (Weekly runs are probably sufficient.) Also, monitor new file types that are being introduced. Schedule your deletion runs to follow your backups so that you can restore files if necessary.
#! perl
#
# cleantree.pl: delete all temporary and backup files from a tree.
#
#
# History:
# - ver 1.0, 08/05/2000, by P. Turelinckx
# . creation
#
# $RCSfile: scanx $$Revision: 1.0 $$Date: 08/05/2000 12:00:00 $
#
use File::stat;
use File::Basename;
use File::Find;
use Getopt::Std;
#
# sub syntax
# Exit while explaining the command line syntax
#
sub syntax
{
die <<"SYNTAX";
Delete all temporary and backup files from a tree.
Usage: $myName [-hl] -r<root directory> [-o<output file>]
Options: -h help
-l just list files, without deleting them
SYNTAX
}
# File extensions for files that should be deleted.
# Do not add the extension separator dot.
# The extensions will be compared case insensitive
#
@extensions =
qw( obj
map
res
ilk
idb
pdb
pdc
pch
plg
tli
tlh
bak
sav
);
# Files/directories that should be excluded from the cleanup.
# This is a list of regexps for directory paths and files that will not be deleted.
# The regexps will be matched case insensitive, but directories have to be
# separated by forward slashes
#
@excludes =
qw( target
savlib
simulate
);
#----------------------------------------------------------------------------
# Start
#
# Parse command line
#
$myName = basename( $0, '.pl', '.p');
print "\n";
getopts( "hlr:o:");
syntax() if $opt_h;
if ( !$opt_r )
{
print "No root directory defined\n";
syntax();
}
if ( $opt_l && !$opt_o )
{
print "No output file specified while option -l used\n";
exit;
}
$opt_r =~ s/[\\\/]$//; # Rip off any trailing slashes
$opt_r = lc($opt_r); # make lowercase
if ( $opt_o )
{
open(RESULT, '>'.$opt_o) || die "Can't write '$opt_o': $!\n";
$now = localtime;
print RESULT "Files deleted by '$myName' on $now:\n\n";
}
# Make extension array lowercase to allow case sensitive string compares
foreach $item (@extensions) { $item = lc($item); }
# Adjust directory separators to support both types
foreach $item (@excludes) { $item =~ s|/|\[\\\\\\/\]|; }
$deleteCount = 0;
use File::Find;
find(\&ProcessFile, $opt_r);
$infoText = "$deleteCount files ";
if ($opt_l) { $infoText .= "would be ";}
$infoText .= "deleted.";
if ($opt_o) { print RESULT "\n\n$infoText"; }
if ($opt_o) { close( RESULT); }
print STDOUT "$infoText\n";
# end
#----------------------------------------------------------------------------
sub ProcessFile
{
if ( !/\.(.+)$/ )
{
return; # no extension found?
}
# check if the extension is on the killer list
#
my $extension = lc($1);
my $found = 0;
foreach $item (@extensions)
{
if ( $item eq $extension )
{
$found = 1;
last;
}
}
if ( !$found )
{
return; # no file we should delete
}
# check if we should exclude these from the cleanup
#
$found = 0;
my $file = $File::Find::name;
foreach $item (@excludes)
{
if ( $file =~ m/$item/i )
{
return; # this one escapes from the massacre
}
}
if ($opt_o) { print RESULT "$file\n"; }
$deleteCount++;
unlink $file unless $opt_l;
}
Paul Turelinckx August 31, 2001