Working with Perfmon CSV logs in Powershell – Part 2


Before I moved to a new topic; I thought, I would give some background on why I wrote “Working with perfmon csv logs in PowerShell – Part 1“.

When you are working in a place where the PerfMon logs are collected religiously and always in CSV format you have to find an easier way to work with CSV files. Sometimes, these files would be as large as 400 MB. The 400MB size was because of the number of counters we collect and that is close to 2 to 3 months of data. When there is a performance issue, depending on its nature you may be looking only to analyze one or two day’s worth of information or would like to find patterns in the data and wanted something which would allow me to query this data. So, I started to investigate on ways to work with large CSV files.

 When I found ‘LogParser’, I was super excited because it supported SQL like grammar and then I started wondering how easy it would be to replicate the functionality in Perl; because I was learning it at the time to help automate/support some scripts. That started off a journey of discovery that ended up as the below script. The script is well documented because; I did not want to forget what I was learning along the way. It took me about a week to finish and helped me in understanding some Perl concepts and a lot of respect for the people who designed the modules I used.

At this time, I am not going to try to explain what the script does as that would become a series of posts on its own.  But, when I have a week or so to myself; I will definitely put them up.

 

#!/usr/bin/perl -w
use warnings;
use strict;

#----------------------------------------------------------------
# Modules used.
#----------------------------------------------------------------
use DBI;
use Getopt::Std;

#----------------------------------------------------------------
# Global Variables
#$csv_dbHandle -"DBI:CSV:f_dir=."- The '.' specifies that the csv
#                               file is in the current directory.
#----------------------------------------------------------------
my ($line_count, $main_status) = (0, 0);
my($file_name, $sql_to_sanitize)  = ("","");

#----------------------------------------------------------------
# Sub-routine declaration in alphabetical order
#----------------------------------------------------------------
sub count_lines_in_file();
sub get_prog_args();
sub main();
sub query_csv($);
sub sanitize_sql_query($);
sub show_help();
sub show_sql_help();

#----------------------------------------------------------------
# Main program call the sub called main, which is the main driver of the
# program.
#----------------------------------------------------------------
&main();

#------------------------------------------------------------------------------
#                        *** SUBROUTINES ***
#------------------------------------------------------------------------------

#------------------------------------------------------------------------------
#Purpose: Main driver of the program
#
#Input Argument: None Output: None
#------------------------------------------------------------------------------
sub main() {
    &get_prog_args();
    #&count_lines_in_file(); #- for debugging only
    #print "No of lines in file: $line_count\n"; #- for debugging only
    &query_csv($file_name);
}
#
#------------------------------------------------------------------------------
#Purpose: Get all the arguments from the command line
#
#Input Argument: None Output: None
#------------------------------------------------------------------------------
sub get_prog_args() {
    getopts('hp:Q:');
    #----------------------------------------------------------------
    # If -h command line option is given show help message
    #----------------------------------------------------------------
    if ($Getopt::Std::opt_h) {
        system 'cls';&show_help(); &show_sql_help(); exit(0);
    } if(!$Getopt::Std::opt_h){} #supress the warning messages.
    #----------------------------------------------------------------
    # If -p command line option is given set file name to process.
    #----------------------------------------------------------------
    if ($Getopt::Std::opt_p){
            $file_name = $Getopt::Std::opt_p;
    }else{
            system 'cls';&show_help(); exit(1);
    }
 #----------------------------------------------------------------
    # If -Q command line option is given get query to process.
    #----------------------------------------------------------------
 if($Getopt::Std::opt_Q){
  $sql_to_sanitize = $Getopt::Std::opt_Q;
                if ($sql_to_sanitize !~ /^SELECT/i){
                    system 'cls';
                    print "Only Select statements are supported\n";
                    print "Press any key to continue...\n";
                    <>;
                    &show_help(); exit(1);
                }
 }else{
   system 'cls';&show_help(); exit(1);
 }
}
#
#------------------------------------------------------------------------------
#Purpose: Count the number of lines in a file and assign it to the line_count
#           global variable.
#Input: None
#
#Output: None
#------------------------------------------------------------------------------
sub count_lines_in_file() {
    unless ( open(CSV_IN, "<",$file_name) ){
         die "Could not open " . $file_name ." for reading.\nError: $!.\n";
    }
    1 while <CSV_IN>;
    #$. perl variable contains the number of lines read from the file.
    $line_count = $.;
    close(CSV_IN);
}

#
#------------------------------------------------------------------------------
#Purpose: parse the csv file and summarise the results.
#
#The DBD::CSV module treats a group of comma-separated value files in a common
#directory as a database. The data source for this driver can contain a
#parameter f_dir that specifies the directory in which the files are located.
#
#At the highest level, you can call the DBI->trace() method, which enables
#tracing on all DBI operations from that point onwards. There are several valid
#tracing levels:
#0-> Disables tracing, 1-> Traces DBI method execution showing returned values
#and errors, 2-> As for 1, but also includes method entry with parameters.
#3-> As for 2, but also includes more internal driver trace information.
#4-> Levels 4, and above >data overload!<.
#
#Due to the SQL standard, field names cannot contain special characters like
#a dot (.). Following the approach of mdb_tools, all these tokens are
#translated to an underscore (_) when reading the first line of the CSV file,
#so all field names are `sanitized'. If you do not want this to happen, set
#raw_header to a true value.
#
# With the f_lock attribute, you can force locking mode (if locking is supported
#at all) for opening tables. By default, tables are opened with a shared lock for
#reading, and with an exclusive lock for writing. The supported modes are:
# 0 -> Force no locking at all.
# 1 -> Only shared locks will be used.
# 2 ->Only exclusive locks will be used.
#
#Input Argument: CSV file Name
#Output: Return 1 OK, 0 Failed
#------------------------------------------------------------------------------
sub query_csv($){
    my $pc_csv_file_name = shift;
    my $pc_query_columns;
    my $pc_csv_db_handle;
    my $pc_query_stmt_handle;
   
    $@ = "";
    eval{
            $pc_csv_db_handle =
                    DBI->connect( "DBI:CSV:f_dir=.", undef, undef,
                                    {
                                        PrintError => 1,
                                        RaiseError => 1,
                                        csv_quote_char => "\"",
                                    }
                                );
           
            #Turn on tracing with trace_level 2
            DBI->trace( 2, 'parse_csv_trace.log' );
           
            #We will be working with individual files for now. Maybe in the
            #future, we can move to using multiple files. Here we are binding
            #the file to a csv file passed as a parameter to the parse_csv
            #sub-routine.
            $pc_csv_db_handle -> {csv_tables} -> {tbl_csvquery} =
                            {
    file => $pc_csv_file_name,
    f_lock => 0,
   };
   
     my $sql_to_run = &sanitize_sql_query($sql_to_sanitize);
            #Prepare statement for exection.
            $pc_query_stmt_handle =
                    $pc_csv_db_handle -> prepare("$sql_to_run");
                   
            #Execute the statement
            $pc_query_stmt_handle -> execute();
           
            # Retrieve the returned rows of data.Need to work on this.
            while ( my @row = $pc_query_stmt_handle->fetchrow_array() ) {
                        print ("@row \n");
            }
           
            #Clean up
            $pc_query_stmt_handle -> finish(); $pc_csv_db_handle ->
            disconnect();
        };
    $@ and die "SQL database error: $@";
}
#
#------------------------------------------------------------------------------
#Purpose: Sanitize the sql query to reflect DBD::CSV sanitization.
#
#Input Argument: Insane_SQL_Query -> String.
#       Output: Sane_SQL_Query -> String.
#------------------------------------------------------------------------------
sub sanitize_sql_query($)
{
    my $sql_to_sanitize = shift;
    while($sql_to_sanitize =~ m/`.*?(?:(?:``){1,}`|(?smxg)
    {
 my $pre_match = $`;
 my $match = $&;
 my $post_match = $';
 $match =~ s/`//g; #replace the back ticks
 $match =~ s/\W/_/g; #sanitize the match
 $sql_to_sanitize = $pre_match . $match . $post_match;
    }
    return(lc($sql_to_sanitize));
}
#
#------------------------------------------------------------------------------
#Purpose: Print a help screen on std out.
#
#Input Argument: None
#       Output: None
#------------------------------------------------------------------------------
sub show_help()
{
print << 'EOT';
#******************************************************************************
#*        CSV PARSER Version 1.0.0 | Written by: SqlChow             *
#******************************************************************************
#*                                                                            *
#* Purpose: Summarise perfmon output, which is usually in the form of CSV file*
#*                                                                            *
#* Assumptions:                                                               *
#*  1-> You have to specify the -p csv file that is to be summarised.         *
#*  2-> You have to use -Q to specify the query. The query itself will be     *
#*      enclosed in double quotes(""), the column names have to be enclosed in*
#*      back ticks(``) and, dates and other identifiers will be enclosed in   *
#*      single qoutes('').                                                    *
#*  3-> The CSV file will always be mapped to table 'tbl_csvquery'.           *
#*                                                                            *
#* For example, see below:                                                    *
#* -----------------------                                                    *
#* Ex:perl csvquery -p"computer_05110201.csv" -Q"SELECT                    *
#*  AVG(`\\computer\LogicalDisk(F:)\Avg. Disk sec/Read`),                  *
#*  AVG(`\\computer\LogicalDisk(F:)\Avg. Disk sec/Transfer`),                 *
#*  AVG(`\\computer\LogicalDisk(F:)\Avg. Disk sec/Write`),                    *
#*  AVG(`\\computer\LogicalDisk(F:)\Disk Read Bytes/sec`),                    *
#*  AVG(`\\computer\LogicalDisk(F:)\Disk Reads/sec`),                         *
#*  AVG(`\\computer\LogicalDisk(F:)\Disk Write Bytes/sec`) FROM tbl_csvquery  *
#*  WHERE `(PDH-CSV 4.0) (Central Daylight Time)(300)`                        *
#*  BETWEEN ('05/11/2010 02:02:12.568', '05/11/2010 02:28:12.648')"           *
#*                                                                            *
#*                                                                            *
#******************************************************************************
#* Note: Program must be executed from within the directory it is located.    *
#*                                                                            *
#* COMMAND SYNOPSIS                                                           *
#* csvquery [-h] <-p csvfile name> <-Q "Sql Query to run">                    *
#*                                                                            *
#* Command line options:                                                      *
#* -h Writes help screen on standard output, then exits.                      *
#* -p mandatory csv file name                                                 *
#* -Q mandatory query to parse                                                *
#*                                                                            *
#*----------------------------------------------------------------------------*
#* This script can run only in three modes                                    *
#*----------------------------------------------------------------------------*
#* Example:                                                                   *
#*   1. Not specifying any command line arguments -> this prints basic help   *
#*      perl csvquery                                                         *
#*                                                                            *
#*   2. Specifying -h command line argument -> prints basic help & SQL help   *
#*      perl csvquery -h                                                      *
#*                                                                            *
#*   3. Specifying the csv file name and query to run                         *
#*      perl csvquery -p computer12.csv -Q                      *
#*                                                                            *
#******************************************************************************
#* NOTE: All column names have to be enclosed in back ticks (``) because, the *
#* DBB::CSV driver sanitizes all column names to replace special characters   *
#* with underscore(_) & we will parse the query to emulate this functionality *
#******************************************************************************
#* ERRORLOGGING: All csv operations are traced out to the following file      *
#* 'parse_csv_trace.log', which will be saved to the same folder              *
#******************************************************************************
EOT
} # End of showHelp

#------------------------------------------------------------------------------
#Purpose: Print a help screen on std out regarding SQL statement syntax that
#           is acceptable in the DBI module
#
#Input Argument: None
#       Output: None
#------------------------------------------------------------------------------
sub show_sql_help()
{
print << 'EOT';
#******************************************************************************
#*SQL Statements:                                                             *
#* Since we will only be using SELECT statement the acceptable syntax is as   *
#* below                                                                      *
#******************************************************************************
#*      SELECT <select_clause>                                                *
#*              <from_clause>                                                 *
#*              [<where_clause>]                                              *
#*              [ ORDER BY ocol1 [ASC|DESC], ... ocolN [ASC|DESC]] ]          *
#*              [ GROUP BY gcol1 [, ... gcolN] ]                              *
#*              [ LIMIT [start,] length ]                                     *
#*                                                                            *
#*Explicit Join Quantifiers:                                                  *
#*      NATURAL, INNER, OUTER, LEFT, RIGHT, FULL                              *
#*                                                                            *
#*Built-in Functions:                                                         *
#*      * Aggregate : MIN, MAX, AVG, SUM, COUNT                               *
#*      * Date/Time : CURRENT_DATE, CURRENT_TIME, CURRENT_TIMESTAMP           *
#*      * String : CHAR_LENGTH, CONCAT, COALESCE, DECODE, LOWER, POSITION,    *
#*                      REGEX, REPLACE, SOUNDEX, SUBSTRING, TRIM, UPPER       *
#*                                                                            *
#*Operators and Predicates:                                                   *
#*      = , <> , < , > , <= , >= , IS [NOT] (NULL|TRUE|FALSE) , LIKE , CLIKE ,*
#*      IN , BETWEEN ('left identifier', 'right identifier')                  *
#*                                                                            *
#*Identifiers and Aliases:                                                    *
#*      * regular identifiers are case insensitive                            *
#*      * delimited identifiers (inside double quotes) are case sensitive     *
#*      * column and table aliases are supported                              *
#*                                                                            *
#*Concatenation:                                                              *
#*      * use either ANSI SQL || or the CONCAT() function                     *
#*      * e.g. these are the same: {foo || bar} {CONCAT(foo,bar)}             *
#*                                                                            *
#*Comments:                                                                   *
#*      * comments must occur before or after statements, can't be embedded   *
#*      * SQL-style single line -- comments are supported                     *
#*      * C-style multi-line /* */ comments are supported                     *
#*                                                                            *
#*NULLs:                                                                      *
#*      * currently NULLs and empty strings are identical in non-ANSI dialect.*
#*      * use {col IS NULL} to find NULLs, not {col=''}(though both may work  *
#*                                                       depending on dialect)*
#******************************************************************************
EOT
}

# TAGS IN PERLDOC:
# *PROGNAME* - replace with program name
# *SHORT_DESCRIPTION* - one sentence description

__END__

=head1 NAME
=over 4
csvquery - Provides ability to query CSV files using SQL like grammar.
=back
=head1 SYNOPSIS

=over 4
csvquery

csvquery -h

csvquery -p computer12.csv -Q

csvquery provides ability to query CSV files using SQL like grammar. At this
time the script only allows SELECT statements to run against the CSV files. The
ability to run UPDATE and INSERT statements will be a TODO for the next version.
=back

=head1 EXAMPLES

=over 4
csvquery

csvquery -h

csvquery -p computer12.csv -Q
=back

=head1 REQUIRED ARGUMENTS

=over 4
-p mandatory csv file name
-Q mandatory query to parse
=back

=head1 OPTIONS

=over 4

=item B<-h>
Display short usage help, SQL syntax help and exit.

=item B<-p>
The csv file which we will be querying.

=item B<-Q>
The query we will be running against the CSV file.

=item B<[--no arguments]>
Display short usage help and exit

=back

=head1 DESCRIPTION

=over 4
csvquery was designed for querying perfmon logs which are in CSV format using
an SQL like grammar.
=back

=head1 DEPENDENCIES

=over 4

=item B<perl 5.6> binary build 613 or newer with standard modules, for 32-bit machines.
=item B<perl 5.8.8> binary build 820 or newer with standard modules, for 64-bit machines.

=back
=head1 BUGS AND LIMITATIONS
=over 4
You are welcome to send bug reports about csvquery to my twitter (@sqlchow).

Please attach in a bug report, informations like:
=back

=over 4
=item * Perl version (perl --version)
=item * information what you've expected as a output
=back
=head1 VERSION
=over 4
This documentation refers to csvquery version 1.0.0.
=back

=head1 AUTHOR
=over 4
@SqlChow
=back
=head1 LICENSE AND COPYRIGHT
=over 4
It is always a fascinating journey gaining knowledge. If you find anything interesting that I would benefit from, kindly share it with me. That is all.
=back
Advertisements
About

By profession, I’m a SQL Server Database Administrator. I love to poke my nose into different corners and see how stuff looks in there. I keep looking for new things to do as my mind refuses to settle on one topic.

Tagged with: , , , , ,
Posted in PowerShell
One comment on “Working with Perfmon CSV logs in Powershell – Part 2
  1. […] I started looking at Perl and got so engrossed in it that I started looking at automating summarizing perfmon logs using couple of modules from CPAN. After spending a week of sleepless days and nights on it, I had […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: