PERL Assignment This assignment will give you an opportunity to work with subroutines, modules, and regular expressions in PERL. You should find the following files in your Perl directory. helloworld.pl argworld.pl template.pl power.pl vis5d.pl program1.c program2.c When you are finished the assignment, you should have the following files in your Perl directory: helloworld.pl argworld.pl template.pl power.pl vis5d.pl program1.c program2.c MyModule.pm test_tokenize.pl filesystem.pl comments.pl HAND IN: template.pl MyModule.pm test_tokenize.pl filesystem.pl comments.pl These files will be collected for marking on 11:59 pm March 18, 2011. Make sure your final copies are in your Perl directory! You will not need to modify program1.c and program2.c for the assignment. Please do not modify your copies, we will be using the originals for testing your scripts. ============================================================================= PART 1: MYMODULE In this part, we will begin creating a new module you can use in your scripts. 1. Create/open MyModule.pm with your favorite editor. 2. Put the perl identifier on the first line (hint #!/usr/bin/perl). 3. Add a comment with . 4. Add the 'package' line. Your package should have the same name as your module file (minus the .pm extension). 5. Add the 'BEGIN' section to your module. You do not have any subroutines to export yet, so you can leave the @EXPORT empty. 6. Add the return statement and 'END' boilerplate. 7. Add an EOF comment to the end of your module. 8. Save your module file. 9. Open template.pl with your favorite editor. 10. Add a use line to include your module. Make sure it appears after your lib line. 11. Save template.pl. This creates the framework for your module and makes sure you include it in any scripts you create using your template.pl. ============================================================================= PART 2: TOKEN SUBROUTINE In this part, you are going to add a simple subroutine that takes a string and parses it into tokens (words, numbers, symbols which are separated in the string by spaces, tabs, or newlines). The subroutine will return an array of tokens. 1. Open MyModule.pm with your favorite editor. 2. Insert the subroutine between the BEGIN section and return line. Setup the subroutines with the 'sub' line and curly braces. Called the subroutine 'tokenize'. Make sure to add tokenize to the @EXPORT array in the BEGIN section. 3. Inside the subroutine, declare two local variables (hint use 'my'). one will hold the string, the other should be an array which will hold the results. Declare any other variables you think you may need as well. 4. Store the supplied string in your local variable. If no string is supplied, print "INVALID STRING" and return. 5. Break the string into tokens storing each one in the array. There are many ways to do this, so I'll leave it up to you to decide how you want to break up the string. If you parse the string yourself, shift and unshift can be very useful for adding elements to the array. You can use PERL to do the parsing for you in a single line if you prefer. Remember that tokens are delimited by spaces, tabs, or newlines (you must account for all three). 6. Return the array. 7. Save MyModule.pm. 8. Copy template.pl to test_tokenize.pl. 9. Open test_tokenize.pl with your favorite editor. 10. Add comments describing what the script is for (to test your tokenize subroutine) 11. test_tokenize.pl will take one argument (a string which you will need to provide in double quotes("). Make sure you fill in the argument section appropriately. Store the string in a variable. 12. Declare an array variable for holding the tokens. Declare any other variables you think you might need (such as for holding a single token or a counter variable). 13. Go to the main section and print out your string so that you can verify you are getting your argument correctly. 14. Call tokenize on your string. 15. Cycle over the elements of the array to make sure tokenize broke the string up properly printing each one to the screen on its own line. 16. Save test_tokenize.pl. 17. Run test_tokenize.pl with a variety of strings to make sure it works as expected. Remember that if you do not quote your string, then each word will be counted as an argument. ============================================================================= PART 3: FILESYSTEM.PL In this part, you will get some experience with file handling and unix system commands. You will produce a script that will list out the files from a path you specify. It should write to an output file called files.txt. This should include a header in the file with your name, student number, and the that was passed in. The script should show all files and whether they are a regular file (FILE) or directory (DIR). For example: >perl filesystem.pl /home/EOSC070 >cat files.txt George Hicks 1000000 /home/EOSC070 FILE .bash_history DIR Desktop FILE distributor.pl FILE index.html DIR .ssh DIR Perl 1. Copy template.pl to filesystem.pl. 2. Open filesystem.pl with your favorite editor. 3. Fill in the comments section describing what the script does and what argument it expects (it should get one argument, the path to list out). 4. Add a variable for the path. Complete the arguments section and make sure you store the path that is passed in to your variable. 5. Add a variable that holds the name of your output file. Make sure you initialize the variable with the output filename given above. Go to the code section of your script and open up your output file. Write a header into it with your name, student number, and the path you have saved. 6. Now you need to get a listing of the files in path and print them out. There are many ways this can be done, but in your case you will get a listing by making a UNIX call with ls. You may need some flags to get all the files required and enough information about the files to tell if they are a regular file or directory. Once you have the file listing saved to a variable (do not forget to declare it), use your tokenize subroutine to break the file listing into tokens. You can store the result in an array (do not forget to declare this either). You can loop over the array (depending on what flags you use, you will have to keep in mind that not all tokens will be filenames) or you can calculate where in the array the filenames are and the information describing them, then print out the appropriate tokens. Do not forget when you print to use the FILEHANDLE. Also do not forget, you need to print either FILE or DIR first (depending on what the filename is, treat anything that is not a directory as a file). 7. Close the output file. 8. Save filesystem.pl. 9. Test your script. You may not be able to list out every path on the system (if you don't have permission to access a directory, you're script won't list it). Try a few obvious ones and check to make sure the script is working properly with ls. 10. Remove your output file. ============================================================================= PART 4: COMMENTS.PL This part will give you some experience dealing with regular expressions. You will be writing a script to parse comments out of C code. You have been provided with two C source code files (program1.c and program2.c). Your script will take the name of a C source code file as an argument (the full or partial path included) and will take any comments from the file and write them to an output file. The name of the output file is an optional argument to the script. If no name is supplied, then your script will use the filename "comments.txt" and output the file to the current directory. It is important to remember what constitutes comments in C code. A comment can begin with '//' and be the rest of the line, or a comment can be anything between '/*' and '*/'. For the second type of comments, remember that the comments can span multiple lines. For both types of comments, remember that the comment can occur on a line with code, but we do not want to print the code, just the comments. 1. Read through program1.c and program2.c using either your favorite editor or cat'ing the files. Notice the difference between the comments in both files. 2. Copy template.pl to comments.pl. 3. Open comments.pl in your favorite editor and fill in the comments section. Make sure to explain what the arguments are (the first argument must be the C source code file, the second optional argument is the output path/filename). 4. Finish the arguments section. You will need a variable to hold both the source code argument and the output argument (do not forget to declare them). If the output argument is not supplied, make sure your variable is set to "comments.txt". 5. Go to the code section and open the output file for writing. Because your are going to have two files open at the same time, you will need to specify different filehandles for each. I recommend using something like OUTPUT for opening the output file. Print a header to the file of the format "name student number date | C code -> output file" You can get the date using the UNIX 'date' command. You may need a variable to hold the result. Normally the result of the unix 'date' command has a newline attached. chomp(variable) is good at removing those. Go to the end of your code section just before you exit and add a statement to close the output file. 6. Now open the C code file for input (you will need a different filehandle than the one you used for the output file, I recommend INPUT). Go to the code section just before the close statement for the output file and close the input file. 7. Cycle over each line from the C code file. Recall that we can use a while loop to do this easily and that the line will be stored in the static variable '$_'. 8. For each line you will need to determine if it is a comment, or contains a comment and code, or is just code. Any comments, and only comments, should be copied from the line and printed into the output file. The way to do this is through regular expressions. Recall that you can find lines which match using $_ =~ m// and that you can pull particular portions of patterns by enclosing those portions in parthenses (they are then stored in the variables $1, $2, $3, etc). At first, it is a good idea to make sure you can find single line comments or lines containing code and a comment. program1.c has these characteristics. To do this, you'll need to find lines containing '//' or '/*' and '*/'. This can be done with a single regular expression, but it may be easier to do two. The key is to make sure you correctly encapsulate the comment in the event it occurs on a line with code. Test your script against program1.c to make sure it gets all the comments. 9. Now comes the real trick, handling multi-line comments. You can see such comments in program2.c. They are the comments that begin with '/*', end with '*/' and span multiple lines. The reason these comments are tricky is that 1) any line which contains only '/*' has a comment which extends to the end of the line 2) any line which contains only '*/' has a comment which extends to the beginning of the line 3) there are lines which are comments that contain neither '/*' nor '*/' We can handle points (1) and (2) with a modification to our current regular expression for single line /* */ comments. If we look for lines containing '/*' they will either be lines that contain complete comments, or lines like (1) above. If we find such a line, we can store what we think is the comment in a variable and then check if it is a single line comment. If it is not a single line comment, we can print what we stored in the variable otherwise we print $_. After having dealt with those two conditions, we can check if the line is one containing '*/' and if it is, simply print everything up to '*/'. We have now solved points (1) and (2). Test your script against program2.c. It should capture every comment line except those to which point (3) applies. 10. Dealing with point (3) will take more than a regular expression. There is nothing on the lines of (3) to indicate they are comments. How do you know they are comments? From the point of view of a human reading the file, you know they are comments because they appear between '/*' and '*/' which occur on other lines bracketing those lines. To solve this problem, you need to know when you are tracking a multi-line comment. A variable that acts as a flag indicating when you are in a comment and when you are not in a comment could solve this problem. How will you know when to change the flag? How do you test for the flag, or set the comment you want to print? the undef(variable) function and defined(variable) test could be helpful. Test your script against program2.c and make sure you can get the full multi-line comments. Also make sure it still works against program1.c. Finally, make sure each comment line is only printed to the output file once. ============================================================================= PART 5: MAKE SURE YOUR FINISHED FILES ARE IN YOUR Perl DIRECTORY! DEADLINE: 11:59 pm March 18, 2011