PERL Assignment

This assignment will give you an opportunity to work with subroutines,
modules, and regular expressions in PERL.  You should find the following
files in your Perl directory.

  helloworld.pl
  argworld.pl
  template.pl
  power.pl
  vis5d.pl
  program1.c
  program2.c

When you are finished the assignment, you should have the following files
in your Perl directory:

  helloworld.pl
  argworld.pl
  template.pl
  power.pl
  vis5d.pl
  program1.c
  program2.c
  MyModule.pm
  test_tokenize.pl
  filesystem.pl
  comments.pl

HAND IN: template.pl MyModule.pm test_tokenize.pl filesystem.pl comments.pl

These files will be collected for marking on 11:59 pm March 18, 2011.  Make
sure your final copies are in your Perl directory!

You will not need to modify program1.c and program2.c for the assignment.
Please do not modify your copies, we will be using the originals for testing
your scripts.

=============================================================================
PART 1: MYMODULE

In this part, we will begin creating a new module you can use in your
scripts.

1.  Create/open MyModule.pm with your favorite editor.
2.  Put the perl identifier on the first line (hint #!/usr/bin/perl).
3.  Add a comment with <your name> <student number>.
4.  Add the 'package' line.  Your package should have the same name as
    your module file (minus the .pm extension).
5.  Add the 'BEGIN' section to your module.  You do not have any
    subroutines to export yet, so you can leave the @EXPORT empty.
6.  Add the return statement and 'END' boilerplate.
7.  Add an EOF comment to the end of your module.
8.  Save your module file.
9.  Open template.pl with your favorite editor.
10. Add a use line to include your module.  Make sure it appears after
    your lib line.
11. Save template.pl.

This creates the framework for your module and makes sure you include it
in any scripts you create using your template.pl.

=============================================================================
PART 2: TOKEN SUBROUTINE

In this part, you are going to add a simple subroutine that takes a string
and parses it into tokens (words, numbers, symbols which are separated in
the string by spaces, tabs, or newlines).  The subroutine will return an
array of tokens.

1.  Open MyModule.pm with your favorite editor.
2.  Insert the subroutine between the BEGIN section and return line.
    Setup the subroutines with the 'sub' line and curly braces.  Called
    the subroutine 'tokenize'.  Make sure to add tokenize to the @EXPORT
    array in the BEGIN section.
3.  Inside the subroutine, declare two local variables (hint use 'my').
    one will hold the string, the other should be an array which will hold
    the results.  Declare any other variables you think you may need as well.
4.  Store the supplied string in your local variable.  If no string is
    supplied, print "INVALID STRING" and return.
5.  Break the string into tokens storing each one in the array.  There are
    many ways to do this, so I'll leave it up to you to decide how you want
    to break up the string.  If you parse the string yourself, shift and
    unshift can be very useful for adding elements to the array.  You can
    use PERL to do the parsing for you in a single line if you prefer.
    Remember that tokens are delimited by spaces, tabs, or newlines (you
    must account for all three).
6.  Return the array.
7.  Save MyModule.pm.
8.  Copy template.pl to test_tokenize.pl.
9.  Open test_tokenize.pl with your favorite editor.
10. Add comments describing what the script is for (to test your tokenize
    subroutine)
11. test_tokenize.pl will take one argument (a string which you will need
    to provide in double quotes(").  Make sure you fill in the argument
    section appropriately.  Store the string in a variable.
12. Declare an array variable for holding the tokens.  Declare any other
    variables you think you might need (such as for holding a single token
    or a counter variable).
13. Go to the main section and print out your string so that you can verify
    you are getting your argument correctly.
14. Call tokenize on your string.
15. Cycle over the elements of the array to make sure tokenize broke the
    string up properly printing each one to the screen on its own line.
16. Save test_tokenize.pl.
17. Run test_tokenize.pl with a variety of strings to make sure it works as
    expected.  Remember that if you do not quote your string, then each
    word will be counted as an argument.

=============================================================================
PART 3: FILESYSTEM.PL

In this part, you will get some experience with file handling and unix
system commands.  You will produce a script that will list out the
files from a path you specify.  It should write to an output file
called files.txt.  This should include a header in the file with your name,
student number, and the <path> that was passed in.

The script should show all files and whether they are a regular file (FILE)
or directory (DIR).  For example:

>perl filesystem.pl /home/EOSC070
>cat files.txt

George Hicks 1000000 /home/EOSC070

FILE .bash_history
DIR Desktop
FILE distributor.pl
FILE index.html
DIR .ssh
DIR Perl


1.  Copy template.pl to filesystem.pl.
2.  Open filesystem.pl with your favorite editor.
3.  Fill in the comments section describing what the script does and
    what argument it expects (it should get one argument, the path
    to list out).
4.  Add a variable for the path.  Complete the arguments section and
    make sure you store the path that is passed in to your variable.
5.  Add a variable that holds the name of your output file.  Make sure
    you initialize the variable with the output filename given above.
    Go to the code section of your script and open up your output file.
    Write a header into it with your name, student number, and the path
    you have saved.
6.  Now you need to get a listing of the files in path and print them
    out.  There are many ways this can be done, but in your case you will
    get a listing by making a UNIX call with  ls.  You may need some flags
    to get all the files required and enough information about the files to
    tell if they are a regular file or directory.  Once you have the file
    listing saved to a variable (do not forget to declare it), use your
    tokenize subroutine to break the file listing into tokens.  You can
    store the result in an array (do not forget to declare this either).
    You can loop over the array (depending on what flags you use, you will
    have to keep in mind that not all tokens will be filenames) or you can
    calculate where in the array the filenames are and the information
    describing them, then print out the appropriate tokens.  Do not forget
    when you print to use the FILEHANDLE.  Also do not forget, you need to
    print either FILE or DIR first (depending on what the filename is, treat
    anything that is not a directory as a file).
7.  Close the output file.
8.  Save filesystem.pl.
9.  Test your script.  You may not be able to list out every path on the
    system (if you don't have permission to access a directory, you're script
    won't list it).  Try a few obvious ones and check to make sure the
    script is working properly with ls.
10. Remove your output file.

=============================================================================
PART 4: COMMENTS.PL

This part will give you some experience dealing with regular expressions.

You will be writing a script to parse comments out of C code.  You have
been provided with two C source code files (program1.c and program2.c).
Your script will take the name of a C source code file as an argument (the
full or partial path included) and will take any comments from the file and
write them to an output file.  The name of the output file is an optional
argument to the script.  If no name is supplied, then your script will use
the filename "comments.txt" and output the file to the current directory.

It is important to remember what constitutes comments in C code.  A comment
can begin with '//' and be the rest of the line, or a comment can be
anything between '/*' and '*/'.  For the second type of comments, remember
that the comments can span multiple lines.  For both types of comments,
remember that the comment can occur on a line with code, but we do not want
to print the code, just the comments.

1.  Read through program1.c and program2.c using either your favorite editor
    or cat'ing the files.  Notice the difference between the comments in
    both files.
2.  Copy template.pl to comments.pl.
3.  Open comments.pl in your favorite editor and fill in the comments
    section.  Make sure to explain what the arguments are (the first
    argument must be the C source code file, the second optional argument
    is the output path/filename).
4.  Finish the arguments section.  You will need a variable to hold both
    the source code argument and the output argument (do not forget to
    declare them).  If the output argument is not supplied, make sure your
    variable is set to "comments.txt".
5.  Go to the code section and open the output file for writing.  Because
    your are going to have two files open at the same time, you will
    need to specify different filehandles for each.  I recommend using
    something like OUTPUT for opening the output file.  Print a
    header to the file of the format
    "name student number date | C code -> output file"
    You can get the date using the UNIX 'date' command.  You may need a
    variable to hold the result.  Normally the result of the unix 'date'
    command has a newline attached.  chomp(variable) is good at removing
    those.  Go to the end of your code section just before you exit and
    add a statement to close the output file.
6.  Now open the C code file for input (you will need a different
    filehandle than the one you used for the output file, I recommend
    INPUT).  Go to the code section just before the close statement for
    the output file and close the input file.
7.  Cycle over each line from the C code file.  Recall that we can use
    a while loop to do this easily and that the line will be stored in
    the static variable '$_'.
8.  For each line you will need to determine if it is a comment, or
    contains a comment and code, or is just code.  Any comments, and
    only comments, should be copied from the line and printed into the
    output file.  The way to do this is through regular expressions.
    Recall that you can find lines which match using
    $_ =~ m/<pattern>/
    and that you can pull particular portions of patterns by enclosing
    those portions in parthenses (they are then stored in the variables
    $1, $2, $3, etc).  At first, it is a good idea to make sure you can
    find single line comments or lines containing code and a comment.
    program1.c has these characteristics.  To do this, you'll need to
    find lines containing '//' or '/*' and '*/'.  This can be done
    with a single regular expression, but it may be easier to do two.
    The key is to make sure you correctly encapsulate the comment in
    the event it occurs on a line with code.  Test your script against
    program1.c to make sure it gets all the comments.
9.  Now comes the real trick, handling multi-line comments.  You can see
    such comments in program2.c.  They are the comments that begin with
    '/*', end with '*/' and span multiple lines.  The reason these comments
    are tricky is that
    1) any line which contains only '/*' has a comment which extends to
       the end of the line
    2) any line which contains only '*/' has a comment which extends to
       the beginning of the line
    3) there are lines which are comments that contain neither '/*' nor
       '*/'
    We can handle points (1) and (2) with a modification to our
    current regular expression for single line /* */ comments.  If we
    look for lines containing '/*' they will either be lines that
    contain complete comments, or lines like (1) above.  If we find
    such a line, we can store what we think is the comment in a variable
    and then check if it is a single line comment.  If it is not a single
    line comment, we can print what we stored in the variable otherwise
    we print $_.  After having dealt with those two conditions, we can
    check if the line is one containing '*/' and if it is, simply print
    everything up to '*/'.  We have now solved points (1) and (2).
    Test your script against program2.c.  It should capture every comment
    line except those to which point (3) applies.
10. Dealing with point (3) will take more than a regular expression.  There
    is nothing on the lines of (3) to indicate they are comments.  How do
    you know they are comments?  From the point of view of a human
    reading the file, you know they are comments because they appear
    between '/*' and '*/' which occur on other lines bracketing those lines.
    To solve this problem, you need to know when you are tracking a
    multi-line comment.  A variable that acts as a flag indicating when
    you are in a comment and when you are not in a comment could solve
    this problem.  How will you know when to change the flag?  How do
    you test for the flag, or set the comment you want to print?  the
    undef(variable) function and defined(variable) test could be helpful.
    Test your script against program2.c and make sure you can get the full
    multi-line comments.  Also make sure it still works against program1.c.
    Finally, make sure each comment line is only printed to the output
    file once.

=============================================================================
PART 5: MAKE SURE YOUR FINISHED FILES ARE IN YOUR Perl DIRECTORY!

DEADLINE: 11:59 pm March 18, 2011