Tuesday, January 27, 2009

From requirements to architecture

In my experience, learning software architecture never flows smoothly as reading a good novel. Often you get lost within a jungle of technical jargon such as architectural views, frameworks, styles, patterns, etc. When asked, I have seen many of the techies struggling just to get started due to this complexity. With this post, I thought of sharing some of my experience on how to get started with architecting process when you are asked to do so.

Following is one of the definitions for the word 'architecture'.
"All architecture is design, but not all design is architecture. Architecture represents the significant design decisions that shape a system, where significant is measured by cost of change." - Grady Booch
The above definition is selected here since it is very much relevant to our discussion. As per Booch, it is the high 'cost-of-change' decisions that should be agreed in the system architecture. Question is from where and how do we find these decisions.

In order to do that, we need to start with the specified requirements for the system. Look for functional/non-functional requirements to identify areas where explicit design definitions are required (you need to use your technical sixth-sense here). Then sort the findings in the order of 'cost-of-change'. Next select top most from the list to be elaborated in your architecture specification. Interestingly it is preferable to omit lower 'cost-of-change' decisions from being pre-specified for few reasons.
  • problem understanding increases over the project life cycle, hence decisions taken later are generally more accurate
  • allowing team to take design decisions increases the architecture buy-in and helps the team to have a good design exposure
  • having lesser concerns in the specification makes it less complex but focused
In addition , the architect should ensure architectural best practices are preserved during this process. For example, concerns such as followings, need to be kept in mind throughout.
- keep architectural uniformity at all layers/sub-systems
- technology to be used should be long lasting in the industry
- competency should be available for the technologies selected

Also the architects should respect any given business constraints such as 'preferred technology of the company' etc. A good understanding of the product road map (explicitly defined) helps to make the architecture less brittle as time passes. Also it is important not to reinvent every single design decision by your self. Always look around the industry to see relevant existing architectures and blueprints. Learning from someones experience is much cheaper than learning from your own!

I will share more of my experiences on the matter in a future post. Interesting reader can have a look at one of my previous posts on the subject here.

Thursday, January 22, 2009

Personal Firewall Development

I have been looking in to "Personal Firewall" development techniques during last couple of days for my interest. Despite the less documentation on the subject, I found several alternative methods for implementing personal firewalls during the research (with the help of my colleague Pathi). But unfortunately only few methods are actually suitable for serious product development and most of the online articles available are not really up to it.

First of all I should say that I'm not an expert on "Windows network architecture". But I will try to explain the matter in simple terms as it is really interesting to study. Following is a diagram summarizing Windows network layers (pale blue) and available extension points (dark blue).
One of the primary functions of a firewall is to control the network traffic by filtering in-out packets (blocking ports, etc). In order to do this, firewall should intercept the network stack at some point. The important question is "What is the best place to do this interception?".

User Mode (Upper OS Layers):
Any intercepts at user mode is useless because malware can easily bypass user mode to operate at kernel mode. Use of Winsock API is such a user mode method. Windows 2000 packet filtering API is another user mode alternative but with same limitations (sample implementation).

Kernel Mode (Lower OS Layers)
A popular kernel mode method is to intercept TDI layer. But unfortunately this is still too high in the stack. Malware can bypass TCP layer to access NDIS layer if it wishes. Also with this approach, your TCP layer is open to a hacker coming from outside network. Even with these limitations several commercial firewall products use this method for their packet filtering.

Much lower level is to use extension points of TCP layer. Windows TCP layer provides an extension called 'Firewall hook driver'. But according to MSDN documentation 'firewall hook driver' has some severe limitations when operating as a firewall filter. But apparently, Microsoft has gone against their own recommendations to use it for their own Windows Firewall. You can find a sample implementation of this method here.

Another extension point of TCP layer is to use 'Filter hook driver'. Filter hook drivers are also not recommended by Microsoft due to the limitation of only a single application can use this extension on a machine. You can find a sample implementation here.

With all above considerations, NDIS layer stays as the only sustainable alternative for a commercial grade firewalls. One NDIS model is to develop a NDIS intermediate driver which is the recommended method by Microsoft. But due to various compatibility and stability issues, most of the vendors have considered a different approach for their products. Rather developing an intermediate driver, they have overridden some of the NDIS function pointers to point at their custom functions. This approach is called NDIS hooking but mostly undocumented. Despite less documentation, NDIS hooking seems to remain as the most favored model for developing commercial grade firewalls. Since no Microsoft provided API available, this method is subjected to break on OS changes. Good discussion comparing the two methods can be found here.

After all, what we have discussed up to now is details for just one feature implementation (packet filtering). There are many other features that cannot be done only by tapping to NDIS but require upper level tapping as well.

I hope you have learnt something interesting by reading this post! For an interested reader following are some good additional reads on this subject.

Windows Network Architecture
NDIS hooking sample
Article on firewall Development
Firewall-hook driver
Alternative model in Vista
Design of a ideal firewall

Saturday, January 17, 2009

What makes a good software architecture?

I remember reading dozens of 'must have' attributes for a good software architecture in several design books. Sometime back I even made a blog post with a list of such attributes. Last Friday I got to rethink when I was in a argument with two of our hardcore techies, Sanjaya and Samudra. In my arguments, I stated that a particular architecture is good if it bare two simple qualities:
 - You shouldn't need "an architect" to understand it
 - It solves the problem in hand "but nothing more"
You shouldn't need an architect to understand it: Just because you are an architect, you don't need to use all complex design patterns in the books to impress others. Developing a simple workable architecture  is always harder than building a complex one. When the developer reads your specification, ensure that he asks himself; "Why we needed an architect to design such a simple system?". 

It solves the problem in hand but nothing more: What is expected from you is  to solve the problem in hand but not to build a crystal ball that solves any problem.  For example, if you asked to build a LOB application, what is required by the users is to carryout their day today LOB operations. They don't expect a 'highly extensible framework for LOB development' from you.  If you try to build the later, often the result is a system that is too complex even for the originally intended use. Design only for what is required; if scalability beyond 100k nodes is not a requirement, don't design for it. Unlimited scalability has unlimited cost. 

Solving real world problems with software is naturally complex enough; we don't need to make things worse with complex architectures! 

Monday, October 20, 2008

Back on Firefox!!!

I was fascinated when I first saw Google Chrome few months back. I fall in love with the non-disturbing browsing experience and willing to compromise the convenience of my Firefox extensions. I continued using Firefox as a programming tool but Chrome was the home for my internet ever since Chrome was released.

It was a tough decision when i decided to fall back to Firefox today. I tried reinstalling twice (despite several installer crashes) to see if I can continue Chrome use but without much luck. My major user experience blow was when Chrome refused to visit some web pages.

I started getting " This webpage is not available " message for some
sites such as YouTube (where all other browsers can access YouTube
without a problem).I couldn't view pdf documents when Chrome throws " This file cannot be found " error. I know Chrome is very much in beta and I will for sure recheck once a stable release is out.

Mean while I would love if Google can re-think on the following. I agree that it is a great idea to have a single process per tab. It provides greater processing power and isolation required by tomorrows complex web applications. But when it comes to simple browsing, having dozens of pages open becomes a resource overkill. If google can come up with a user friendly switch for advanced users to decide between the options, it can help heavy googlers like me.

I have reported most of other issues, enhancements to google and hope to see a much stable version soon. 

Saturday, March 15, 2008

Propagating Identity Information with Thread Local Variables


Often in our web applications we need to pass the logged in user identity down to business and data layers. This is mostly required when the business processes need to behave differently for different users/roles.

Straight forward mechanism would be to pass user identity as a parameter for every business process method, but can quickly become verbose and painful.

Another option is to use 'session based singleton' to store the user identity. Once the user is authenticated we use this session based singleton to store the user identity. Business processes can access this session based singleton to obtain the same. Even though this works, the bi-directional dependency introduced (as explained below) limits the usefulness of this approach.

Session based singleton uses the http-session for storing/retrieving information. i.e it has a dependency upon the web ui libraries. The business processes depends on the session based singleton to obtain the user identity. This makes the business processes to be dependent on web ui libraries. As naturally web uis are dependent on the business layer this causes a bi-directional dependency between web ui and the business layer.

Use of thread local variables presents a viable alternative design to achieve the requirement. The thread locals stores information local to a particular thread. Variables placed by one thread is can be accessed only by the same thread (Each thread can have a separate value for the variable).

Remaining section of the post looks in to the details of a thread local variable implementation for achieving the user identity propagation in Java. Mainly there are two parts of the implementation, a servlet filter and custom 'ThreadContext' class.

Servlet filter is responsible of protecting the web application from unauthenticated access. Any request without a user identity (IUser instance) in the session are directed to "login.jsp" which places a user identity in the session after successful authentication. What is important for our discussion is the fact that this filter is also responsible of attaching the user identity to the request thread (see below).
public class AuthFilter implements Filter {

public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) {
HttpServletRequest httpServletRequest = (HttpServletRequest) request;
HttpServletResponse httpServletResponse = (HttpServletResponse) response;

// get the user from session
IUser user=(IUser)httpServletRequest.getSession().getAttribute(IUser.class.getName());
if (user == null ) {
//user is not in the session, not authenticated
httpServletResponse.sendRedirect("login.jsp");
} else {
//Authenticated, lets store the user in the thread context
ThreadContext.setUser(user);
try {
// call rest of the filters
chain.doFilter(request, response);
}finally {
// remove the thread local variable
ThreadContext.setUser(null);
}
}
}

public void init(FilterConfig filterConfig) throws ServletException {}
public void destroy() {}
}


In the above code if the user is authenticated then we call "ThreadContext.setUser(user)" to save the user information to the current thread local store. Once the processing is over (i.e after chain.doFilter()) we remove the user from the current thread bu calling "ThreadContext.setUser(null)".

Now lets look in to the other part of the implementation, i.e ThreadContext class. Not much of coding is required here.
public class ThreadContext {
private static ThreadLocal threadLocalUser = new ThreadLocal();

public static IUser getUser() {
return threadLocalUser.get();
}
public static void setUser(IUser user) {
if (user == null ) {
threadLocalUser.remove();
}
threadLocalUser.set(user);
}
}

You can see the static variable "threadLocalUser" is responsible of storing and managing the user identity for all the threads. Depending on the thread which calls the methods of the above class, appropriate IUser instances are selected by the "ThreadLocal" class.

Even though the thread local variables are really powerful construct you need to be very careful in using them. If the variables are not cleared at the end of the process (finally block) that memory can be leaked. Also this should not be used as an alternative to parameter passing but should only be used for contextual information propagation.

One other cool use of this would be to propagate transactions in declarative transaction management frameworks. I'm planning to release an new version of "Easy Data Access Framework" which makes use of thread local variables that completely hide the transaction instances from the developers.

Microsoft .NET framework has built-in support for passing the user identity through the thread. "System.Threading.Thread.CurrentPrincipal" can be used to get or set the user identity to the current thread.

Monday, January 07, 2008

Eurocenter BI Framework White Paper

We at Eurocenter were working hard to build few middleware frameworks throughout the past year. One of the most popular is 'Eurocenter Business Intelligence - ECBI' framework. We already have few customers using ECBI framework and are expecting to get in to the market strongly this year.

I was writing a product white paper on our ECBI framework during last week (Thanks Uchitha and Samudra for reviews). I thought of sharing the first draft through my blog. You can access the full draft here. Following is the introductory chapter of the white paper.

Challenge for Today’s Business Intelligence

Historically business intelligence was considered as a largely manual, back office work performed by high tech professionals. Data from different sources was periodically integrated in to a set of defined reports and forwarded to business users for their decision making.

Due to the ever increasing competition, the business world has demanded the technology to deliver much dynamic information for faster/better decision making. Over last 5 years business intelligence feature set had rapidly grown and the definition of business intelligence shifted to mean much more front office related activities by the information end users. During the period, numerous products appeared in the industry delivering lot of advanced and sophisticated features.

Although these products empowered users to make better decisions, most of the business intelligence products have unfortunately ignored an important fact. That is, ‘80 percent of the information users in a typical organization are often non-technical decision makers’. The complications introduced by the advanced feature set make these products to be overly difficult for end decision makers. According to our observations, it is only less than 20 percent of the available features are effectively used by business users in day-today decision making.

Evidently, the well-known 80-20 rule has a negative struck on the business intelligence product domain too. Overcoming this problem is becoming one of the most important challenges for BI products today. We believe that the business intelligence industry is now at its next maturity transition which can take the products to focus on ‘effective features’ by getting out of ‘features chaos’.

Proper channeling of right features to right users for right decision making is what is expected from the next generation business intelligence products. As a result, organizations are on the lookout for simpler and more focused BI products from the marketplace today.


Note: Credit for building this fantastic framework should go to Eurocenter BI Team lead by Samudra (Samudra, Ravith, Hiran, Eranga, Prashanthan, Vindya, Prasad, Janith, Lalinda, Rajive, Sandrina).

Sunday, December 30, 2007

Conserns for large scale integration architectures


During last couple of days I was writing a report on some existing academic research papers as a part of my postgraduate studies. I selected to write on one of my favorite topics, 'Integration Architectures'. Following is an abstract (Introduction Chapter) of my report. You can access the full report here.

ABSTRACT:

Most of the organizations today live by the slogan “Buy the best, build the rest”. Cost and risk of building from ground-up has made most organizations to consider buying commercial-off-the-shelf (COTS) products. Effective integration of these purchased IT systems has become the main responsibility of IT managers today.

Unfortunately, it is not very common for an organization to find all the required systems to be homogeneous. Often they are from different vendors having diverse architectures and operating on different platforms. The schema of the information model can widely differ from one system to another. At all these complexities, a seamless integration and smooth business process flow is what expected from the IT infrastructure.

According to Pollock (2001), robust integration architecture should support both ‘Application Integration’ as well as ‘Information Integration’ against heterogeneity. ‘Application Integration’ is the process of linking different software systems to become a part of a larger system. This is the technical solution that decides the level of integration (data level, application level, transaction level, process level, or human level) and technology of communication. Therefore ‘Application Integration’ mainly deals with the transportation of data/objects/messages between heterogeneous systems.

On the other hand, ‘Information Integration’ deals with the meaning and semantics of the communication. The meta-data, business rules and domain schema of one party should be understood by the other party for the integration to be successful. Maximum exchange of meanings by transformation of one entire domain representation schema to another partially compatible domain representation schema is the challenge of ‘Information Integration’.

‘Application Integration’ aspects are primary requirements that need to be satisfied by any integration architecture. But that is still only half of the total picture. Most integration architectures fall in to the trap of focusing on much of these technical aspects but forget the quality aspects of ‘Information Integration’. Simple integration requirements may be full filled by architectures biased to one arena, but complex integrations definitely require lot of attention and balance of both these aspects.

Despite the number of integration technologies and patterns exists, the extent to which the above goal is realized is debatable. This report looks in to and evaluates two such integration architectures published in order to solve the integration puzzle. The selected two architectures present two dissimilar approaches towards enterprise integration. Main focus of the evaluation is to study the two architectural patterns to assess their support for ‘Application Integration’ and ‘Information Integration’ aspects as set by Pollock (2001) in his white paper.