Showing posts with label php-tools. Show all posts
Showing posts with label php-tools. Show all posts

Migration to the nine-headed monster

After some recent activity on the psat-dev mailinglist I became aware of the (lack of) available builds for php-sat. Even though the development speed is not what I would want it to be (so much fun things to do, so little hours in a day!) I still believe it is important to release early and often.

Fortunately, Eelco Dolstra had some time to migrate php-front, php-sat and php-tools to Hydra, the new Nix-based continuous build system. After some tweaking we now again have access to unstable build for all PSAT-projects. Go Hydra!

Calculating the complexity of PHP

Not only is the question of calculating the cyclomatic complexity showing up in our mailinglist and bugtracker, it is also showing up on other mailing-lists! This shows that the time has come to introduce a solution to this problem based on PHP-Front:
          php-cyco
(indeed, yet another creative name :)

This tool is added to the PHP-Tools project as of revision 421. The implementation is not very big, the code that does the actually work is about 20 lines, but it relies heavily on PHP-Front. Thats where libraries are for, right?

So what does it do? The tool basically counts the number of branches that exists in your code. This is done for the main-flow and separately for each function. By default, each function (and the main-flow) starts with a complexity of 1 since there is always a minimal path through the code. For each construct that introduces a branch within the control-flow, for example if-, for- and while-statements, the complexity is increased by 1. In the end the number represents the number of paths that exists from the start until the end of the function (or main-flow). The higher the number, the more difficult it becomes to understand and test the block of code.

The complete list of constructs that increase the complexity can be found at the bottom of the source. I think that the list is understandable, even though it uses 'Sorts'. If this is not the case please let me know, and I will put up a list using php-syntax.

As you can see the complexity is increased by logical operators. This is because the semantics of the code:
if($foo || $bar){ //code }
is equal to the following code:
if($foo) {  
//code
} elseif($bar){
//code
}
in other words, we can rewrite a logical expression as a set of if-else-statements. Since they share the same semantics they should share the same complexity.

One last note, the tool actually calculates the minimum complexity of a function. Even though php-cyco is capable of including files, using the usual --*-inclusion flags, it might encounter situation in which it cannot resolve a filename. In this case the calculated complexity can be lower than the actual complexity because the included file could contain some of the complexity-increasing constructs.

So if you are curious to know which function within your php-project has the largest complexity I suggest you download the PHP-Tools project and start analyzing your code!

Please let me know if you have any comments or feedback.

A little migration helper

You probably already heard about it, but support for PHP version 4 is partly dropped as of 31-12-2007. And after 08-08-2008 the support ends completely.

Luckily, the PHP documentation team provides you with a set of migration guides. Going through these guides can take some time, but it enables you to upgrade your code easily to be PHP5 compatible.

To make your life even easier, the PHP-tools project is extended with the tool test-migration. This tool performs some checks that are described by the migration guide to detect whether the code can be run under version 5 of PHP. These checks include:
  • Is there a function definition with the same name as a function newly defined in PHP5?
  • Is an object created without the class being defined first?
  • Where are the functions strrpos, strripos and ip2long used?
  • Is there any place in which there is reflection within PHP that uses changed behavior?
The first two checks are rather easy to understand, PHP5 will simply halt execution with an error when these issues are detected at runtime. Therefore, the warnings that are generated for these kind of patterns are shown with a 'serious'-level.
On the other hand, the last two checks do not find constructions that can halt execution. They detect places in which certain constructions are used. These constructions where already available within PHP version 4, but their behavior changed in version 5. To easily find these constructions they are flagged by a 'minor'-warning. More details about the changed behavior can be found here.

The first version of this tool is quit basic and performs only a few checks. If you would like to see more check included, don't hesitate to drop me an email or put them in the comments.

It is getting noticed

Within the last two weeks I noticed that PHP-Front and PHP-Sat are being discovered by people that are looking for PHP-specific solutions. It is not that we are flooded with request, but I am still happy with every question :)

The first question was send to the psat-dev-mailinglist and was about the Cyclomatic complexity of PHP code. I replied that it would not get into PHP-Sat because it is not a bug-pattern. However, it would be a nice tool for the PHP-Tools project. I made a similar tool for Java because of an assignment in the past, so it is probably just a matter of renaming the Strategies to use the PHP-Front api. Unfortunately, I didn't get an answer about how the report should look like. If you have any ideas please let me know in the comments, or in the issue.

A second question was about the grammar of PHP-Front, or actually the license of this grammar. The people behind TXL have derived a PHP-grammar for TXL from the SDF-grammar in PHP-Front. Since our license does not state anything about derived work without common source, we were asked for our permission to distribute this new grammar. Naturally, this permission was given very quickly and we were also allowed to take a peak at the source. I must say that I find it interesting, but I currently do not have time to look into it all. I imagine that the definitions of the grammars and TXL itself is similar to how things work in Stratego, but I have to look into that.

The last question in this series is about defined functions. Finding out which functions are defined in a project is easy when classes are ignored, a simple grep on the project will do. When a project also includes classes it becomes trickier to get all functions defined outside of a class. The question was whether PHP-Front could help with this issue, and the answer is of course yes!
Within the reflection part of the library the list of defined functions and classes is already available. This makes it possible to write a tool to show all defined functions in just a few lines of code. Since it was also a nice tool for the PHP-Tools project I added a tool for this last Friday.

Another issue that was brought up by the last e-mail is the issue of our implementation language. Since Stratego is relatively unknown the project has a steep learning curve. On the other hand, if I had chosen a different implementation language it would have taken me way longer to implement the current features. And besides, this piece of code is not that hard to understand right?
  defined-functions-main =
include-files-complex
; get-php-environment
; get-functions
; if ?[]
then !"No functions defined."
else map(transform-to-message)
; lines
end

Where you at?

Now that the propagation of safety-types seems to go smoothly it was time to dive into another subject: accessing the location of terms. In this case, the location of a term is defined as the location of the text in the original file that was parsed to that term. Since the strategy to annotate the AST with position-info is available in the standard libraries nowadays, it should be easy to access these locations and finally solve PSAT-91 right? Lets find out!

The first thing I did was to add a separate module to handle the low-level stuff of getting the location annotations. This module contains several getter-strategies that can retrieve, for example, the number of the start-line. The location info is captured in six different numbers: start-line, end-line, start-column, end-column, offset and length. A getter-strategy is available for all of them. Furthermore, the name of the file in which the term is defined can be retrieved.

Although these getter-strategies are useful, they are not meant to be called directly. I figured that the most common use of these functions would be reporting the values in some kind of (formatted) message. In order to capture this kind of behavior the strategy format-location-string(|message) is defined. This strategy takes a message with holes in the form of [STARTLINE] as parameter and fills these holes with values from the current term. A rather useful strategy if I say so myself.

To practice with this new piece of functionality I have added an extra option to the tool input-vector of the php-tools-project. This option allows the user to choose between the normal list, or the same list with line-numbers printed for each access. More information about this option and how to add an option yourself can be found here.

After this was done I moved to php-sat to make the output more concise. It was actually pretty easy to implement. The algorithm is nothing more then get-terms-with-annotations, make-nice-output. I actually spend more time on creating a test-setup for calling php-sat through a shell-script then on generating the more concise format. The only problem was that the adding of position-info everywhere interfered with the dynamic-rules. A few well-places rm-annotations where needed to fix this. Please let me know if you like the new output, or whether something should be added.

The next applications of the location info is the tracking of where untainted data enters an application. When a function is called with a parameter $foo which is tainted, it would be nice to show when it was tainted. I think this is not too difficult to add, but bugs always seem to lurk in 'I-think-it-is-easy-to-add'-features.

A last remark about locations is a small problem without an actual solution. Eventually php-sat must support function-calls. The algorithm to analyze function-calls is not complicated, but how can bug-patterns within a function be reported? A message before each call to this function? Within the file in which the function is defined? And what about cases in which one call is flagged and the other one isn't? And can we also handle object-creation in the same way? I haven't figured out how to handle this, so if you have any ideas please let me know.