Saturday, September 29, 2007

WebSpider Review

Today's assignment is WebSpider Review. I'm reviewing Ben Karsin's spider.


Installation

The package installed fine, including all the ant files. Everything verified and ant jar worked.

The one thing that gave me a little trouble was the logging. Running the program with -logging at the end of the command line didn't log anything. I was pretty confused by that until I checked the source code. All the logging code was just commented out.

Aside from that minor snag, all was well.

Code Format Fixes

All in WebSpider.java:

LineRuleRule Description
13ICS-SE-Java-4Do not include // comments at the top of the file.
40EJS-27Pluralize the names of collection references.
152EJS-36Use standard comments to hide code without removing it.

Test Cases

Black Box

The testing wasn't really being done from a black box perspective. I think there was just a decision to go with white box.

Because of that, there isn't much partitioning into equivalence classes. The JUnit tests cover a typical case (starting at http://www.hackystat.org) and the case of a bad URL for the starting page (http://). They also cover both of the command line arguments.

Some equivalence classes that could be added:

  • Pages that link to themselves.
  • Cycles of more than one page.
  • "Diamond" linking (A links to B and C, B and C both link to D).
  • Pages with links to missing pages.
  • Pages with bad URLs in their links.
  • Pages with no links.
  • Running out of pages before the maximum.

These would be tested mostly for the result rather than whether or not they throw an exception. The hackystat.org test probably already has several of these, but it's only being tested for exceptions.

White Box

Test coverage is much better from a white box perspective.

Emma reports code coverage of:

Emma Coverage summary
class:   100% (2/2)
method:  100% (7/7)
block:   95%  (346/364)
line:    94%  (68/72)

A couple of white box tests that could be added:

  • Relative links.
  • Duplicate links on the same page.

Some of the things listed under black box could also be seen as white box tests because there's specific code to handle them.

Breaking

Page with no links:

$ java -jar *jar -mostpopular 'http://www2.hawaii.edu/~jaress' 10
Exception in thread "main" java.lang.NullPointerException
       at edu.hawaii.webspider.WebSpider.main(WebSpider.java:133)

Lessons Learned

My spider actually has a very similar flaw:

$ java -jar *jar -mostpopular 'http://www2.hawaii.edu/~jaress' 10
Most Popular: http://www2.hawaii.edu/~jaress/ with null

(It should say "0" instead of "null".)

Probably the best way to fix that would be initialize the count for the start page to zero.

Seeing Ben's approach also helped me decide what I should have done differently. There were parts of my spider that I wasn't happy with, and I was thinking about how I should have done it. After seeing Ben's code I decided that the root problem was over-generalizing.

I had created a helper class that was designed as if I had no clue what it was actually going to be used for. For example, it would return objects defined in HttpUnit -- with all their error-prone methods -- in case the calling code wanted to do something wacky with it like analyze the text. It also forced the calling code to decide when it had seen enough pages, in case the decision needed to be based on some complicated logic.

That was a Bad Idea, and my helper should have:

  • Pulled the link data out of the HttpUnit object, put it into a simpler object, and returned that.
  • Kept track of the maximum number of pages.

Monday, September 24, 2007

WebSpider

Today's assignment is WebSpider. My spider is here.


All three tasks accomplished.

HttpUnit

I used HttpUnit to create fake servers for testing my spider, so I learned about the server parts of it as well as the client parts. It's a actually nicer on the server end, and I wouldn't mind using it for that again. (But it's terrible on the client side.)

Logging

I didn't spend very much time on logging. I just grabbed something from Logger.getLogger and used setUseParentHandlers to turn it on and off. That seems a bit crude, but it works.

Kevin English posted some nice code to suppress all those HttpUnit stack traces. It's not really a part of logging, but it cleared away the screen so I could see my log.

Java

I had a very interesting time with Java on this project. HttpUnit (on the client side) has a bad habit of declaring checked exceptions that it never throws, then throwing many different unchecked exceptions that crash the program.

At the same time, I was trying to wrap part of the program in an Iterator for the silly reason that it made logical sense. Java won't let you add checked exceptions when you're implementing an interface, so I couldn't just propagate the checked exceptions knowing they didn't actually happen.

I ended up with some dead code to satisfy the compiler, a lot of try/catch blocks covering different levels of nesting (if it fails in the outer loop, continue the outer loop; if it fails in the inner loop . . .) and pre-fetching to make sure that next always worked when hasNext was true.

Basically, several rules of Java that are supposed to stop things from going wrong actually helped them go wrong and then got in the way of fixing them.

Sunday, September 16, 2007

Stack

Today's assignment is stack. My stack is here.


Results

All five tasks completed successfully.

Problems

I had the same problem that Randy had with eclipse not finding classes. Thanks to Randy, it was easy to fix.

Thanks also to Jianfei Liao for pointing out that we might be expected to explicitly check for an install of JavaNCSS, even though a proper check is redundant to trying to use it.

I added the check in case it's required, but I still think it's bad. That kind of redundant checking can confuse people.

The most interesting thing I did was for the code coverage. I did add unit tests, but I also changed some of the non-test code.

ClearStack had a method called getTop that did nothing except call the method top inherited from Stack. So I changed it to be a single method defined in Stack and used directly without re-definition in ClearStack (just like push and pop). I wasn't sure whether the combined function should be called top or getTop, but I decided that getTop was more conventional. Coverage improved significantly.

Then I added a test to make sure that getTop throws an exception if the stack is empty. I also expanded the "normal" test to check getTop on a one-item stack and Stack.toString on empty and three-item stacks. That brought it to 100%.

Ant

Ant is a nice tool, but I think it sometimes tries to do too much.

It also has few rough edges. For example, if you try to expand a property it doesn't recognize, it silently uses the unexpanded form. That gives you things like:

[echo] PMD found ${pmd.failure.count} problem(s). 

Standards

I understand the motivation for the build file structuring, but I'm not pleased with it. What I'd like to do (but didn't) is:

  1. Rename build.xml to compile.build.xml (and update all the import references).
  2. Rename verify.build.xml to build.xml.
  3. Import dist into the new build.xml.

That wouldn't make any of the files harder to understand. In fact, the new build.xml would be easier to understand than the old one. But it would make everything a lot easier. You'd do ant to verify, ant dist to make a distribution, ant pmd to run just PMD, etc.

JavaNCSS vs. SCLC

I prefer JavaNCSS. It was easier to get useful information out of the JavaNCSS output. It only looks at the Java code, but it looks at it in much greater depth. The SCLC output left me wondering "OK, what do I do with that?" It's also nice to have the cyclomatic complexity.

The non-comment line counts don't match, and I think that JavaNCSS might be excluding blank lines while SCLC is including them. That would be another plus for JavaNCSS.

Tuesday, September 4, 2007

Code Ruler Redux

Today, we have Code Ruler Redux. My ruler was reviewed by Randy Cox, and I have a revised ruler here.


Team Member

Brian Jaress

Style Changes

First, of course, I fixed the class coding standards violations that Randy found.

I also changed some of the code to make better use of Java 5 features. In a couple places I'd been looping over an array and adding every element to a Set, and I changed it to a one-liner using Arrays.asList() and addAll().

Strategy Changes

I also made some adjustments to the strategy, adding a target minimum number of peasants and always attacking enemy knights when there are no other enemy pieces. That took care of two edge cases that came up surprisingly often.

Here are the updated test results against built-in rulers:

Migratingjaress
1773
0781
0783
Gang-upjaress
176668
162701
80732
Split-upjaress
180654
148653
202610

Shaoxuan Zhang, whose ruler I reviewed, checked the code of successful rulers from another school and adapted them into something that sounds similar in spirit to my strategy: figure out what types of targets you want to attack and have each knight attack the closest one.

It's not quite the same, though. His ruler attacks castles first, then attacks knights and peasants once it has all castles. My ruler attacks castles, peasants, and weak knights first. When all of those are gone, it attacks strong knights.

Lessons Learned

Arrays.asList() is a very handy function static method, and Eclipse is a lot easier now that I've hooked up my mouse.

I also realized that my ruler is evil.

Of course, all the rulers try to kill off enemy pieces and squeeze out other rulers, and my ruler isn't the most aggressive or effective. If, however, you think of the strategy as representing the behavior of an actual person, my ruler is the worst thug.

The other rulers seem to have either sophisticated strategies that optimize the local behavior of their pieces (Ben Karsin's ruler is a good example) or simple strategies like "capture the castles" or "split up into groups." My ruler's simple strategies are "target the weak" and "don't let anyone have a bigger army." Even the peasant strategy, which I think is quite effective, leads to a creepy "peasant Matrix' effect.

Saturday, September 1, 2007

Code Ruler Review

Today's assignment is Code Ruler Review. I'm reviewing the code of Shaoxuan Zhang.


Overall Impressions

A little dense in places, but generally well written. The strategy was clear and sensibly implemented.

I noticed that English does not seem to be Shaoxuan's first language. I had no problem understanding the comments, but the type of grammar tweaking that the rules insist on seemed rather pointless. I think maybe those rules should be relaxed in this case (but I listed them anyway).

The number of things to fix was surprisingly low, considering it was written before we got the rules.

Standards Check

Entries in the "Lines" column refer to the MyRuler.java file.

LinesRuleRule Description
ICS-SE-Java-0Follow EJS standards.
1ICS-SE-Java-1All code is within the package hierarchy "edu.hawaii".
1ICS-SE-Java-2Do not use the wildcard "*" in import statements.
ICS-SE-Java-3Only one Java statement per line.
ICS-SE-Java-4Do not include // comments at the top of the file.
ICS-SE-Java-5Rewrite default Eclipse template JavaDocs to be useful.
ICS-SE-Java-6Format JavaDoc summary lines correctly
ICS-SE-Java-7Do not use Vector and Hashtable classes.
ICS-SE-Java-8Do not use "raw" Collections classes
ICS-SE-Java-9Prefer the for-each control structure to the for control structure.
ICS-SE-Java-10Use the @Override annotation when overriding equals() and hashCode
ICS-SE-Eclipse-1Use Eclipse Europa 3.3
*ICS-SE-Eclipse-2Import and use the Eclipse Java Code formatting template.
ICS-SE-Eclipse-3Configure Text Editor.
EJS-1Adhere to the style of the original.
EJS-2Adhere to the Principle of Least Astonishment.
EJS-3Do it right the first time.
EJS-4Document any deviations.
EJS-5Indent nested code.
EJS-6Break up long lines.
EJS-7Include whitespace
22,23,24,*EJS-8Do not use "hard" tabs
EJS-9Use meaningful names.
EJS-10Use familiar names.
EJS-11Question excessively long names.
EJS-12Join the vowel generation.
EJS-13Capitalize only the first letter in acronyms.
EJS-14Do not use names that differ only in case.
EJS-15Use the reversed, lowercase name of your organization's Internet domain name as the root qualifier for your package names.
EJS-16Use a single, lowercase word as the root name of each package.
EJS-17NOT USED
EJS-18Capitalize the first letter of each word that appears in a class or interface name.
EJS-19Use nouns when naming classes.
EJS-20Pluralize the names of classes that group related attributes, static services or constants.
EJS-21Use nouns or adjectives when naming interfaces.
EJS-22Use lowercase for the first word and capitalize only the first letter of each subsequent word that appears in a method name.
157EJS-23Use verbs when naming methods.
EJS-24Follow the JavaBeans™ conventions for naming property and accessor methods.
EJS-25Use lowercase for the first word and capitalize only the first letter of each subsequent word that appears in a variable name.
EJS-26Use nouns to name fields.
EJS-27Pluralize the names of collection references.
EJS-28Establish and use a set of names for trivial "throwaway" variables.
92,96,100,*EJS-29Qualify field names with "this" to distinguish them from local variables.
EJS-30When a constructor or "set" assigns a parameter to a field, give that parameter the same name as the field.
EJS-31Use uppercase letters for each word and separate each pair of words with an underscore when naming constants.
EJS-32Write documentation for those who must use your code and those who must maintain it.
EJS-33Keep comments and code in sync.
12EJS-34Use the active voice and omit needless words.
22,28EJS-35Use documentation comments to describe the programming interface.
EJS-36Use standard comments to hide code without removing it.
32,36,40,*EJS-37Use one-line comments to explain implementation details.
EJS-38Describe the programming interface before you write the code.
91EJS-39Document public, protected, package, and private members.
EJS-40Provide a summary description and overview for each package.
EJS-41Provide a summary description and overview for each application or group of packages.
EJS-42Use a single consistent format and organization for all documentation comments.
EJS-43NOT USED
EJS-44Wrap code with <pre>...</pre> tags.
EJS-45Consider marking the first occurrence of an identifier with a {@link} tag.
EJS-46Establish and use a fixed ordering for Javadoc tags.
125,150,198,*EJS-47Write in the third-person narrative form.
8,124,149EJS-48Write summary descriptions that stand alone.
124,149EJS-49Omit the subject in summary descriptions of actions or services.
8EJS-50Omit the subject and the verb in summary descriptions of things.
10,152EJS-51Use "this" rather than "the" when referring to instances of the current class.
EJS-52Do not add parentheses to a method or constructor name unless you want to specify a particular signature.
EJS-53Provide a summary description for each class, interface, field and method.
91EJS-54Fully describe the signature of each method.
EJS-55Include examples.
EJS-56Document preconditions, postconditions, and invariant conditions.
EJS-57Document known defects and deficiencies.
EJS-58Document synchronization semantics.
EJS-59Add internal comments only if they will aid others in understanding your code.
EJS-60Describe why the code is doing what it does, not what the code is doing.
172,187,205,*EJS-61Avoid the use of end-line comments.
EJS-62Explain local variable creations with an end-line comment.
EJS-63Establish and use a set of keywords to flag unresolved issues.
EJS-64Label closing braces in highly nested control structures.
EJS-65Add a "fall-through" comment between two case labels, if no break statement separates those labels.
EJS-66Label empty statements.
EJS-67Consider declaring classes representing fundamental data types as final.
EJS-68Build concrete types from native types and other concrete types.
EJS-69Define small classes and methods.
EJS-70Define subclasses so they may be used anywhere there superclass may be used.
55,59,63,*EJS-71Make all fields private.
EJS-72Use polymorphism instead of instanceof.
EJS-73NOT USED?
EJS-74NOT USED
EJS-75Replace nontrivial expressions with equivalent methods.
105,109,132,*EJS-76Use block statements instead of expression statements in control flow constructs.
EJS-77Clarify the order of operations with parentheses.
EJS-78Always code a break statement in the last case of a switch statements.
EJS-79Use equals(), not ==, to test for equality of objects.
EJS-80Always construct objects in a valid state.
EJS-81Do not call nonfinal methods from within a constructor.
EJS-82Use nested constructors to eliminate redundant code.
EJS-83Use unchecked, run-time errors to report serious unexpected errors that may indicate an error in the program's logic.
EJS-84Use checked exceptions to report errors that may occur, however rarely, under normal program operation.
EJS-85Use return codes to report expected state changes.
EJS-86Only convert exceptions to add information.
EJS-87Do not silently absorb a run-time error exception.
EJS-88Use a finally block to release resources.
EJS-89Program by contract.
EJS-90NOT USED
EJS-91Use assertions to catch logic errors in your code.
EJS-92Use assertions to test pre- and post-conditions of a method.
EJS-93Use threads only where appropriate.
EJS-94Avoid synchronization.
EJS-95Use synchronized wrappers to provide synchronized interfaces.
EJS-96Do not synchronize an entire method if the method contains significant operations that do not need to be synchronized.
EJS-97Avoid unnecessary synchronization when reading or writing instance variables.
EJS-98Consider using notify() instead of notifyAll().
EJS-99Use the double-check pattern for synchronized initialization.
EJS-100Use lazy initialization.
EJS-101Avoid creating unnecessary objects.
EJS-102Reinitialize and reuse objects to avoid new object construction.
EJS-103Leave optimization for last.
EJS-104Place types that are commonly used, changed, and released together, or mutually dependent on each other into the same package.
EJS-105Isolate volatile classes and interfaces in a separates packages.
EJS-106Avoid making packages that are difficult to change dependent on packages that are easy to change.
EJS-107Maximize abstraction to maximize stability.
EJS-108Capture high-level design and architecture as stable abstractions organized into stable packages.