辅导ICS-33留学生、讲解Python编程语言、EBNF辅导讲解、讲解Python

2018.10.20 - 首页 >> Python编程

Quiz #2: File Reading, EBNF, Regular Expressions, and Python’s re Module ICS-33 Fall 2018

When working on this quiz, recall the rules stated on the Academic Integrity Contract that you signed. You

can download the q2helper project folder (available for Friday, on the Weekly Schedule link) in which to write

your Regular Expressions and write/test/debug your code. Submit your completed files for repattern1a.txt,

repattern1b.txt, repattern2a.txt, and your q2solution.py module online by Thursday, 11:30pm. I will

post my solutions to EEE reachable via the Solutions link on Friday morning.

For parts 1a, 1b, and 2a, use a text editor (I suggest using Eclipse’s) to write and submit a one line file.

The line should start with the ^ character and end (on the same line) with the $ character. The contents of

that one line should be exactly what you typed-in/tested in the online Regular Expression checker.

The q2helper project folder also contains a bm1.txt, bm2a.txt and bm2b.txt files (examine them) to use

for batch-matching your pattern, via the bm option in the retester.py script (included in the download).

These patterns are also tested automatically in q2solution.py script and similar examples in the

q2helper’s bscq2F18.txt file.

1a. (2 pts) Write a regular expression pattern that matches times on a 12-hour clock written in the format

hour:minute:second. Here hour can be any one- or two-digit number in the range 1-12 (with no leading 0

allowed); :minute is optional: if present it can be any two-digit number in the range 00-59; :second is optional: if

present, it can be any two-digit number in the range 00-59; at the end is a mandatory am/pm indicator. Here are a

few legal/illegal examples.

Legal: Should Match : 6pm, 6:23pm, 6:23:15am, 12am, 11:03am, 8:40:04pm

Illegal: Should Not Match: 6, 06pm, 14pm, 6::pm, 6:60pm, 6:111pm, 6:4pm, 6:04:7pm, 6:23:15:23am

Put your answer in repattern1.txt.

1b. (3 pts) Write a regular expression pattern that matches the same strings described in part 1a. But in addition

for this pattern , ensure group 1 is the hour; group 2 is the minute (or None if :minute is not present); group 3 is

the second (or None if :second is not present); and group 4 is am or pm. For example, if we execute

m = re.match(the-pattern, '6:23pm’) then m.groups() returns ('6', '23', None, 'pm'). There should

be no other numbered groups. Hint (?:...) creates a parenthesized regular expression that is not numbered as a

group. You can write one regular expression for both 1a and 1b, or you can write a simpler one for 1a (ignore

groups) and then update it for 1b by including the necessary groups. Put your answer in repattern1b.txt.

2a. (4 pts) When we print computer documents, there is a common form used to specify the page numbers to

print. Generally, commas separate page specifications, where each page specification is a single page

number, or a contiguous range of pages. In a page specification, a dash or a colon can separate two numbers:

for a dash, these numbers specify the first and last pages in a range to print (inclusive); for a colon, they specify

the first page and how many subsequent pages to print (so 10:3 means 3 pages starting at 10: 10, 11, and 12).

Finally, if either of these forms is used, we can optionally write a slash followed by a number (call it n), which

means for the page specification, print every nth page in the range (so 10-20/3 means 10 through 20 , but only

every 3rd page: 10, 13, 16, and 19). Write a regular expression that ensures group 1 is the first page; group 2 is

a dash or colon (or None if not present); group 3 is the number after the dash or colon (or None if not present);

group 4 is the number after the slash (or None if not present).

Write a regular expression pattern that describes a single page specification: the integers you specify here must

start with a non-0 digit. Here are examples that should match/should not match a single page specification:

Match 3 and 5-8 and 12:3 and 5-8 and 6:4 and 10-20/3 and 10:10/3

Not Match 03 and 5-08 and 3 4 and 3 to 8 and 4/3 and 4-:3 and 4-6:3

Put your answer in repattern2a.txt.

2b. (8 pts) Define a function named pages that takes one str as an argument, representing a list of page

specifications separated by commas, and a bool value controlling whether the pages are unique: printed only

once; it returns a list, sorted in ascending order, of all the pages (ints) in the page specifications. This function

must use the regular expression pattern you wrote for part 2a and extract (using the group function)

information to create the numbers in the page specification. For example, if we called the function as

pages('5-8,10:2,3,7:10/3’,unique=True) it would return the list [3,5,6,7,8,10,11,13,16]; if we called

pages('5-8,10:2,3,7:10/3’,unique=False) it would return the list [3,5,6,7,7,8,10,10,11,13,16].

Here are some more examples of arguments to pages and their meanings:

'3' page [3]

'3,5-8,12:3' pages [3,5,6,7,8,12,13,14] (3, 5 to 8, 12 and 2 more pages)

'6-10,4-8' pages [4,5,6,6,7,7,8,8,9,10] (pages are ordered; assume unique is False)

'6-10/2,4-10/2' pages [4,6,8,10] (pages are ordered; assume unique is True)

Raise an AssertionError exception (using Python’s assert statement) if any page specifications fails to

match the regular expression, or if any dash separator separates a higher first number from a lower second one:

e.g., 10-8 raises/prints the exception AssertionError: pages: in page specification 10-8, 10 > 8.

The page specification 8-8 is OK: it means only page 8.

Hint: My function body is 15 lines of code (this number is not a requirement). After using split to separate the

str argument to get a list of page specifications, use the re.match function (using your regular expression

solution from Problem #2a) on each, then call the group function to extract the required information, and finally

process it. The range class is very helpful in determining the pages in each page specification.

3. (7 pts) EBNF allows us to name rules and then build complex descriptions whose right-hand sides use

these names. But Regular Expression (RE) patterns are not named, so they cannot contain the names of

other patterns. It would be useful to have named REs and use their names in other REs. In this problem,

we will represent named RE patterns by using a dict (whose keys are the names and whose associated

values are RE patterns that can contain names), and then repeatedly replace the names by their RE

patterns, to produce complicated RE patterns that contains no names.

Define a function named expand_re that takes one dict as an argument, representing various names and their

associated RE patterns; expand_re returns None, but mutates the dict by repeatedly replacing each name by its

pattern, in all the other patterns. The names in patterns will always appear between #s. For example, if p is the

dict {digit: r’\d’, integer: r’[+-]?#digit##digit#*’} then after calling expand_re(p), p is now

the dict {'integer': '[+-]?(?:\\d)(?:\\d)*', 'digit': '\\d'}. Notice that digit remains the same

(the raw string r’\d’ prints as'\\d'), but each #digit# in integer has been replaced by its associated pattern

and put inside a pair of parentheses prefaced by ?:. Hint: For every rule in the dictionary, substitute (see the

sub function in re) all occurrences of its key (as a pattern, in the form #key#) by its associated value (always

putting the value inside parentheses), in every rule in the dictionary. The order in which names are replaced by

patterns is not important. Hint: I used re.compile for the #key# pattern (here no ^ or $ anchors!), and my

function was 4 lines long (this number is not a requirement).

The q2solution.py module contains the example above and two more complicated ones (and in comments,

the dicts that result when all the RE patterns are substituted for their names). These examples are tested in the

bsc.txt file as well.