A research study appeared recently that analysed the impact of affirmative action (caste based reservation system in India) on the productivity of a PSU (Indian Railways in this case). The study was conducted by Ashwini Deshpande & Thomas E. Weisskopf in 2011. A link to the pre-published version of this paper is available (here). This blog post is a review of this paper (strictly) based on the version hyper linked here.
The objective of the study was to prove that caste based reservation system does not impact the productivity of an organization. A systematic study would not only shut the voices of people raising this concern against the reservation system, but also provide valuable supporting evidence for the Supreme Court of India to make a decision on several different Mandal Commission related cases pending before it. Although the claim of the authors may be true, is the study really producing such a clinching evidence to make such a claim in support of caste based reservation system?
Here are some of my critical observations of the study :-
- If you read the statement linking Footnote 4 and what the actual footnote 4 reads, you will get a good idea of how exactly the study flows in terms of linking things.
- Firstly at a conceptual level to argue that Labour mix significantly impacts how many passenger KMs are produced or not is a stretch for an asset intensive, capital good intensive industry like Railways. Secondly it is a monopoly and certain amount of captive demand will get filled irrespective of what the Labour mix is. And this captive demand will be increasing over time because of population explosion and increase in economic mobility in the country. This captive demand hike will NOT be captured by fixed effect time variable during regression. We need a dynamic time effect that is increasing with time to capture this effect. A better proxy is to use GDP of that year or for that state(zone) as a control variable. They have not controlled for this aspect.
- Taking passenger KM as a measure of output assumes that demand is infinite and capacity utilization is not a function of demand. It is only the other production factors that influence how many passenger KMs are produced. I am not sure how Railways demand can be modeled this way de-linking from the demand. This is a critical concern that needs to be addressed.
- Railways output in terms of Passenger KM is a function of how many new trains are launched and in which routes (which are budget announcements). One can argue time controls for this during regression. But a pure fixed effect regression may not adequately capture this. We need a better control variable.
- Railways policy of how "crowded" their lines can be has evolved with time. With better technology (and some say reduction in safety standards) they have allowed for more "crowding" i.e. they run two consecutive trains on the same track with lesser time gap. One can argue technology controls for this effect. There is another issue with technology variable they have chosen and we will come to it later.
- They are using FUEL QUANTITY as an independent variable. This is normally a good proxy for technology in many capital intensive industries e.g. Iron & Steel. But during regression this factor should pop up with a negative sign i.e. the quantity of fuel consumed for covering the same Passenger KM should decrease with time. This has not happened and this is already triggering alarm bells on how technology has been controlled for!!!
- Any employee with time "learns" on the job. The rate of learning may be different for different employees. if critiques of caste reservation policy are to be taken seriously, they may argue this rate is different for different categories too. However the model makes no effort to consider the learning effect of employees both general and SC/ST over time. The existing model has a secularly increasing figure of %SC/ST over time. This variable combines older employees with newer employees and can produce a spurious result due to the presence of learning effect over time! This wont be captured by fixed effect time variable as it an interaction variable.
- The first approach to comparison is some what naive and I dont lend too much credence to it. The second approach is a more appropriate to study a problem of this type. But what I don't understand is why has time adjustment not been made in the 2nd step regression in the second approach ? That has potential to throw spurious results. We need to see the exact equation they have used.
- Normally such problems should be strictly studied in a "difference in difference" approach by having a dataset that covers zones/time periods where the policy was not introduced with the ones after it was introduced. I understand they have data limitations in attempting to do anything like this. But at the same time they should give benefit of doubt to their limitations.
To summarize the following red herrings stand out to me :-
- Choose an industry such as Public sector Banks which are service businesses and the quality/productivity/performance will be strongly linked to the human resources they deploy. Furthermore banking sector is competitive and hence if people are unhappy, they will move to another competitor. This will make a more convincing case to study than an asset intensive monopoly like Railways.
- Unlimited Demand assumption. This is invalid and needs to be accounted for.
- Improper use of Fixed time effect as it does not capture dynamic captive demand surge
- Not controlling for Govt of India announcements and Railway "crowding" policy changes.
- Not having a good proxy for Technology. In fact having a bad proxy.
- Not accounting for the learning effect of the employees.
It would be nice if they can overcome these limitations and produce better version of analysis so that we can all appreciate the results better.