代做CSSE2310/CSSE7231 — Semester 1, 2024 Assignment 3 2024代写R编程
- 首页 >> DatabaseCSSE2310/CSSE7231 — Semester 1, 2024
Assignment 3 (version 1.2)
Marks: 75
Weighting: 15%
Due: 4:00pm Friday 3 May, 2024
Introduction
The goal of this assignment is to demonstrate your skills and ability in fundamental process management and communication concepts (pipes and signals), and to further develop your C programming skills with a moderately complex program.
You are to create a program (called uqfindexec) which allows users to run a specified command, or a pipeline of commands, on all files in a given directory. (This is similar to the -exec functionality available with the Linux find command.) The assignment will also test your ability to code to a particular programming style. guide, and to use a revision control system appropriately.
CSSE7231 students will be required to implement additional functionality for full marks.
Student Conduct
This section is unchanged from assignment one – but you should remind yourself of the referencing requirements.
This is an individual assignment. You should feel free to discuss general aspects of C programming and the assignment specification with fellow students, including on the discussion forum. In general, questions like “How should the program behave if h this happensi ?” would be safe, if they are seeking clarification on the specification.
You must not actively help (or seek help from) other students or other people with the actual design, structure and/or coding of your assignment solution. It is cheating to look at another person’s assignment code and it is cheating to allow your code to be seen or shared in printed or electronic form. by others. All submitted code will be subject to automated checks for plagiarism and collusion. If we detect plagiarism or collusion, formal misconduct actions will be initiated against you, and those you cheated with. That’s right, if you share your code with a friend, even inadvertently, then both of you are in trouble. Do not post your code to a public place such as the course discussion forum or a public code repository. (Code in private posts to the discussion forum is permitted.) You must assume that some students in the course may have very long extensions so do not post your code to any public repository until at least three months after the result release date for the course (or check with the course coordinator if you wish to post it sooner). Do not allow others to access your computer – you must keep your code secure. Never leave your work unattended.
You must follow the following code usage and referencing rules for all code committed to your SVN repository (not just the version that you submit):
You must not share this assignment specification with any person (other than course staff), organ-isation, website, etc. Uploading or otherwise providing the assignment specification or part of it to a third party including online tutorial and contract cheating websites is considered misconduct. The university is aware of many of these sites and many cooperate with us in misconduct investigations. You are permitted to post small extracts of this document to the course Ed Discussion forum for the purposes of seeking or providing clarification on this specification.
The teaching staff will conduct interviews with a subset of students about their submissions, for the purposes of establishing genuine authorship. If you write your own code, you have nothing to fear from this process. If you legitimately use code from other sources (following the usage/referencing requirements in the table above) then you are expected to understand that code. If you are not able to adequately explain the design of your solution and/or adequately explain your submitted code (and/or earlier versions in your repository) and/or be able to make simple modifications to it as requested at the interview, then your assignment mark will be scaled down based on the level of understanding you are able to demonstrate and/or your submission may be subject to a misconduct investigation where your interview responses form. part of the evidence. Failure to attend a scheduled interview will result in zero marks for the assignment unless there are documented exceptional circumstances that prevent you from attending.
Students will be selected for interview based on a number of factors that may include (but are not limited to):
• Feedback from course staff based on observations in class, on the discussion forum, and during marking;
• An unusual commit history (versions and/or messages), e.g. limited evidence of progressive development;
• Variation of student performance, code style, etc. over time;
• Use of unusual or uncommon code structure/functions etc.;
• Referencing, or lack of referencing, present in code;
• Use of, or suspicion of undocumented use of, artificial intelligence or other code generation tools; and
• Reports from students or others about student work.
In short – Don’t risk it! If you’re having trouble, seek help early from a member of the teaching staff. Don’t be tempted to copy another student’s code or to use an online cheating service. Don’t help another CSSE2310/7231 student with their code no matter how desperate they may be and no matter how close your relationship. You should read and understand the statements on student misconduct in the course profile and on the school website: https://eecs.uq.edu.au/current-students/guidelines-and-policies-students/student-conduct.
Specification
The uqfindexec program will execute a specified command, or pipeline of commands, for each file found in a specified directory. If the string {} appears in the command(s) then it will be replaced by the name of the file being processed. For example, running
uqfindexec --directory /etc "wc -l {}"
will run the command
wc -l /etc/filename
for every file found in /etc – where filename is replaced by the name of each file in turn. (Overall, this will report the number of lines in each file in /etc.)
Running
uqfindexec --showhidden "tr a-z A-Z < {} > {}.out"
will run the command
tr a-z A-Z < filename > filename.out
for every file found in the current directory – including hidden files whose names begin with ‘.’. (Overall, this command capitalises the content of every file found in the current directory and saves that output to a new file with the .out suffix added to the name.)
The command
uqfindexec --showhidden --directory /etc 'stat {} | grep "Change: 2023" | cut -d " " -f 2' 77
will run the pipeline
stat /etc/filename | grep "Change: 2023" | cut -d " " -f 2
on every file found in /etc – including hidden files. (This command reports the modification dates of all files in /etc that were modified in 2023.)
Full details of the required behaviour are provided below.
Command Line Arguments
Your program (uqfindexec) is to accept command line arguments as follows:
./uqfindexec [--directory dirname ] [--showhidden] [--parallel] [--statistics] [--descend] [cmd ]
The square brackets ([]) indicate optional arguments (or pairs of arguments). The italics indicate placehold-ers for user-supplied arguments. Any or all of the options can be specified (at most once each). The command, if specified, must always be the last argument and can be assumed not to start with --. Option arguments can be in any order.
Some examples of how the program might be run include the following:
./uqfindexec
./uqfindexec --descend
./uqfindexec --directory .. "echo {}"
./uqfindexec --parallel --directory /etc --showhidden --descend 'wc -l {}'
The meaning of the arguments is as follows:
• --directory – if specified, this option argument is followed by the name of the directory whose files are to be processed. If omitted, the current directory (.) is to be used.
• --descend – if specified, this option argument indicates that all subdirectories of the given (or default) directory are to be processed recursively after files in the given/default directory are processed. Support for this functionality is only required for CSSE7231 students but the programs of all students must accept the argument without error. See details of required CSSE7231 functionality on page 8.
• --parallel – if specified, this option argument indicates that the processing of the files must be performed in parallel. (By default, processing must take place sequentially – one file at a time.)
• --showhidden – if specified, this option argument means that hidden files (those whose names begin with.) must be processed in addition to non-hidden files. When the --descend option is also specified, then hidden subdirectories will also be processed.
• --statistics – if specified, this option argument means that statistics are output to standard error when uqfindexec finishes
• cmd – if specified, the given command or command pipeline must be run for each file being processed. Any instances of {} in the command argument must first be replaced by the name of the file being processed. (More details are provided below about the syntax of the command or command pipeline.) If this argument is not present then the default command ("echo {}") must be used. This default command will just print the name of each file being processed.
Prior to doing anything else, your program must check the command line arguments for validity. If the program receives an invalid command line then it must print the (single line) message:
Usage: uqfindexec [--directory dirname] [--showhidden] [--parallel] [--statistics] [--descend] [cmd] to standard error (with a following newline), and exit with an exit status of 3.
Invalid command lines include (but may not be limited to) any of the following:
• The --directory option argument is given but it is not followed by an associated value argument.
• Any of the option arguments is listed more than once.
• An unexpected argument is present.
• Any argument is the empty string.
• An argument other than the dirname argument starts with -- but isn’t one of the expected option arguments.
Checking whether the dirname , and/or cmd arguments (if present) are themselves valid is not part of the usage checking (other than checking that their values are not empty). The validity of these values is checked after command line validity as described in the sections below – and in the same order as these sections are listed.
Directory Validity Checking
Your program must check that the nominated directory (the directory specified on the command line or otherwise the default directory (.)) exists and is readable. If it is not, then your program must print the message:
uqfindexec: directory "dirname" can not be accessed
to standard error (with a following newline) where dirname is replaced by the name of the directory from the command line (or . if no directory was specified on the command line). The double quotes must be present. Your program must then exit with an exit status of 17.
Command Checking
If the cmd argument is specified then your program must check that it is valid. A library function has been provided to parse the string. You can use the return value from this function to determine whether the command or command pipeline is valid. See details of this provided library function on page 9.
If the cmd argument is not valid, then your program must print the message:
uqfindexec: command is invalid
to standard error (with a following newline), and exit with an exit status of 16.
Command Description
The cmd argument is a string that may contain a single command to be executed (possibly with additional arguments) or a pipeline of such commands to be executed. The format is a simplified form. of a shell command with elements separated by whitespace characters. Elements in the command can be enclosed in double quotes to escape special characters such as spaces, |, < and >. It is not possible to escape double quote characters or the {} placeholder.
A single command will have a format like the following (where square brackets [ ] indicate optional elements, italics indicate text to to be replaced with an appropriate argument, and an ellipsis (. . . ) indicates that the previous element is repeatable, e.g. that multiple arguments can be given):
cmd [arg ...] [ < inputfile ] [ > outputfile ]
If a standard input redirection is not specified then the command’s standard input must be inherited from the command’s parent (uqfindexec). If a standard output redirection is not specified then the command’s standard output must be inherited from the command’s parent (uqfindexec). The redirections can be in either order.
A pipeline of commands will have two or more commands separated by the | symbol. Only the first command may have a standard input redirection. (If not present, the standard input of the first command in the pipeline will be inherited from the command’s parent - uqfindexec.) Only the last command may have a standard output redirection. (If not present, the standard output of the last command in the pipeline will be inherited from the command’s parent - uqfindexec.) A pipeline of commands will have the following format (using the notation above):
cmd [arg ...] [ < inputfile ] [ | cmd [arg ...] ]... | cmd [arg ...] [ > outputfile ]
As mentioned above, a library function is provided (see Provided Library on page 9) that will parse the command string so that you don’t have to write code to do this. (You will need to implement the filename placeholder ({}) substitution as described under Filename Substitution below.)
The standard error for all commands executed will always be inherited from the parent (uqfindexec).
Program Operation
If the given directory and command are valid then your program must iterate over all files in that directory and execute the command pipeline for that file. (The term command pipeline includes the case where just one command is to be executed.) If the filename placeholder {} is present anywhere in the command pipeline then it must be replaced by the name of the file being processed. See Filename Substitution below.
Only directory entries that are regular files or symbolic links to regular files are to be processed. Other entry types such as those for subdirectories are to be ignored. Symbolic links to targets that don’t exist or are in inaccessible directories can be either included or excluded from processing. This situation will not be tested.
Files must be processed in the same order that the command ls uses when listing filenames. ls sorts names using the strcoll() comparison function. This compares strings based on the current locale. You must use the “en_AU” collation locale for comparison purposes. Do this by calling
setlocale(LC_COLLATE, "en_AU");
somewhere in your program prior to doing any sorting. See the Hints for a function that will return directory entries in the correct order.
By default, hidden files (i.e. those whose names begin with .) must be skipped. However, if the --showhidden 182 argument is specified on the command line then these files must also be processed. Files must be processed in the same order that ls -a uses when listing filenames. Again, only regular files or symbolic links to regular files are to be processed.
Filename Substitution
Any occurrence of the placeholder {} in the supplied command string must be replaced by the name of the file being processed prior to the command being executed. The placeholder may be present in (or may be the whole of) the name of an executable, an argument to an executable, the name of the file to be the standard input for the first command in the pipeline, or the name of the file to be the standard output for the last command in the pipeline. Multiple placeholders may be present in a command string.
If the --directory option is not specified on the command line then the {} placeholder must be substituted by the name of the file without any path component present.
If the --directory option is specified on the command line then the path to the file must be included in the substitution. A single slash (/) will be added between the path and the filename if the path does not have a trailing /. For example:
• if uqfindexec is run with the arguments --directory /etc then the placeholder {} will be substituted by /etc/filename for each filename in /etc;
• if uqfindexec is run with the arguments --directory /etc/ then the placeholder {} will be substituted by /etc/filename for each filename in /etc (i.e. there is no additional / between the path and the filename);
• if uqfindexec is run with the arguments --directory ./././/////./// then the placeholder {} will be substituted by ./././/////.///filename for each filename in the current directory.
Command Execution
For each file being processed, commands in the pipeline must be executed as follows. (The term pipeline here includes the case where there is only one command.) If there are N files then this sequence is repeated N times.
1. If an input file is specified for the first (or only) command in the pipeline then it must be opened for reading and if this fails then the command execution process for this file is aborted (none of the steps below are undertaken for this file) and the following message must be printed to standard error (with a terminating newline):
uqfindexec: cannot read "filename1" while processing "filename2"
where filename1 is replaced by the name of the file that could not be opened and filename2 is replaced by the name of the file being processed (using the same format as would result from {} placeholder substitution).
2. If an output file is specified for the last (or only) command in the pipeline then it must be opened for writing (creating the file if it does not exist, truncating the file if it does exist). If the open fails then the command execution process for this file is aborted and the following message must be printed to standard error (with a terminating newline):
uqfindexec: cannot open "filename1" for writing while processing "filename2"
where filename1 is replaced by the name of the file that could not be opened and filename2 is replaced by the name of the file being processed (using the same format as would result from {} placeholder substitution). Output files must be created with at least read and write permission for the owner (user) and no permissions for others (i.e. rw????--- permissions where the ? bits can be set or not).
3. The commands that make up the pipeline must be executed (after appropriate creation of pipes and redirection of standard input/output as required). Each command will be executed in its own child process – where each process is an immediate child of uqfindexec. Executables must be searched for in the user’s PATH. (Note that all processes in the pipeline must be created before any processes in the pipelined are reaped.)
4. The child process(es) created in the previous step must be reaped in turn from the first in the pipeline to the last, i.e. after all processes in the pipeline have been created, your program must wait for the first process in the pipeline to finish and be reaped before moving on the second process, etc. If an execution failed (i.e. a command could not be executed, e.g. because the executable was not found in the user’s PATH) then the following message must be printed to standard error (with a terminating newline):
uqfindexec: cannot execute "cmd " when processing "filename "
where cmd is replaced by the name of the executable whose execution failed, and filename is replaced by the name of the file being processed (using the same format as would result from {} placeholder substitution). Multiple of these messages may be printed if multiple commands in a pipeline could not be executed.
5. The steps above are repeated for the next file to be processed. (It is possible that this may result in similar error messages being printed again.)
Your program must note the return status of each process to generate statistics – see Exiting (Statistics output) below.
Parallel Mode
If --parallel is specified on the uqfindexec command line then the command execution steps above are performed in a different order. All commands must be executed (or attempted to be executed) for each file prior to any child processes being reaped. In other words, steps 1 to 3 above are performed for all files in the directory, and then step 4 is performed for the same files – i.e. for each file in turn (in the same order), the child process(es) created must be reaped in turn (from the first in the pipeline to the last).
Your program does not have to deal with the possibility of fork() failing due to creating too many processes. Directories with “reasonable” numbers of files will be used in testing.
This functionality is considered to be more advanced as you will need to create data structures to record many more process IDs to enable delayed reaping.
Interrupting the Jobs
If uqfindexec receives a SIGINT (as usually sent by pressing Ctrl-C) when running in sequential mode then it should allow the current file processing job to finish (and reap any associated child processes) and not commence processing any further files. If uqfindexec is running in parallel mode (i.e. --parallel was specified on the command line) then the SIGINT should be ignored, unless you are a CSSE7231 student implementing the --descend functionality described below.
Your program is permitted to use a single bool global variable to implement signal handling.
Note that pressing Ctrl-C on a terminal will send a signal to a whole process group – which will include the children of uqfindexec. During testing for this functionality we will send a SIGINT only to uqfindexec
Other Requirements
Your program must must free all dynamically allocated memory before exiting. (This requirement does not apply to child processes of uqfindexec.)
Child processes of uqfindexec must not inherit any unnecessary open file descriptors opened by uqfindexec. (Open file descriptors that uqfindexec inherits from its parent and that are passed to a child must remain open in the child.)
uqfindexec is not to leave behind any orphan processes (i.e. when uqfindexec exits normally then none of its children must still be running). uqfindexec is also not to leave behind any zombie processes – when processing files sequentially, all child processes from processing one file must be reaped before commands are run for the next file.
uqfindexec must not busy wait, i.e. it should not repeatedly check for something (e.g. process termination) in a loop. This means that use of the WNOHANG option when waiting is not permitted.
All commands run by uqfindexec when processing files must be direct children of uqfindexec, i.e. the use of grandchild processes is not permitted.
Exiting (Statistics output)
When uqfindexec has finished processing all the files (or has been interrupted and will not be processing further files), then, if --statistics is specified on the command line, uqfindexec must print the following to its standard error:
Attempted to process a total of N1 files
- processing succeeded for N2 files
- processing may have failed for N3 files
- processing was terminated by signal for N4 files
- pipeline not executed for N5 files
where
• N1 is replaced by the number of files that were processed;
• N2 is replaced by the number of files for which every command in the pipeline exited normally with status 0;
• N3 is replaced by the number of files for which every command in the pipeline exited normally but one or more of them exited with a non-zero exit status;
• N4 is replaced by the number of files for which some command in the pipeline exited due to being signalled; and
• N5 is replaced by the number of files for which the pipeline was not executed (due to the input or output file not being able to be opened) or for which any command in the pipeline was not able to be executed (e.g. the command was not in the user’s PATH).
Note that N1 = N2 + N3 + N4 + N5.
If --statistics is not specified on the command line then nothing is output on exit.
Whether the statistics are printed or not, your program must exit with exit status:
• 10 if any processing failed (i.e. N5 > 0)
• 12 if no processing failed but the program is exiting was interrupted by SIGINT prior to completion of the processing due to interruption by SIGINT (i.e. not all files were processed because of the interruption)
• 0 otherwise
CSSE7231 Functionality – Directory Recursion
CSSE2310 students are not expected to implement this functionality. No marks will be awarded if you do so.
If the --descend argument is given on the uqfindexec command line then your program must process files in subdirectories of the given (or default) directory after processing the files in that given (or default) directory.
After files in the given (or default) directory are processed as described above then your program must iterate over all the subdirectories in that directory. Subdirectories must be processed in the same order as ls lists their names. If the --showhidden argument is also given, then hidden subdirectories (those whose names begin with .) must also be included – and subdirectories will be processed in the same order as ls -a lists their names. (Note that . and .. are not subdirectories.)
After a subdirectory’s files are processed then your program must recursively descend into its subdirectories before returning to the next subdirectory of the original directory. In other words, your program must undertake a depth-first traversal of the directory hierarchy. For example, if the directory to be processed is /A and the following directories also exist: /A/B1, /A/B2, /A/B3, /A/B1/C1, /A/B1/C2, /A/B3/D1, then the directories must be processed in the following order:
• /A
• /A/B1
• /A/B1/C1
• /A/B1/C2
• /A/B2
• /A/B3
• /A/B3/D1
Note that symbolic links to directories are not considered to be subdirectories and must not be traversed.
Filename placeholder substitution will take place as described in Filename Substitution above with the addition of subdirectory information between the given/default path and the filename. Added path elements must be separated by a single /.
For example:
• if uqfindexec was run without the --directory argument and is currently processing the abc/def subdirectory of the current directory, then the placeholder {} will be substituted by abc/def/filename for each filename in that subdirectory
• if uqfindexec was run with the --descend --directory /etc arguments and is currently processing the/etc/ssh directory, then the placeholder {} will be substituted by /etc/ssh/filename for each filename in that subdirectory
If a subdirectory is inaccessible (i.e. can not be opened for reading) then your program must print the following message to stderr (terminated with a newline):
uqfindexec: unable to access child directory "subdir "
where subdir is replaced by the name of the subdirectory (formatted as if it were a filename after {} placeholder substitution). For example, if uqfindexec was run with the --descend --directory /etc arguments and it fails to access the /nftables subdirectory of /etc, then the subdir name printed will be /etc/nftables.
If the --parallel argument is given in addition to the --descend argument then the files within a (sub)directory must be processed in parallel but each (sub)directory must be processed in turn in the or-der specified above. If a SIGINT signal is received while in parallel mode, then processing of all files in that (sub)directory must be completed but no further subdirectories are to be processed.