Skip to main content

Data analytics: How scientific enquiry process impacts quality of policy research

By IMPRI Team 

Given the multidimensionality of policy and impact research, tech-driven policy prescriptions are playing a dominant role in the 21st century. As such, data analytics have become integral in this space. IMPRI Generation Alpha Data Centre (GenAlphaDC), IMPRI Impact and Policy Research Institute New Delhi has successfully conducted a #WebPolicyTalk 6-Week Immersive Online Hands-on Certificate Training Course on Data Analytics for Policy Research, spanning over 6-consecutive Saturdays from October 15th to November 19th, 2022. Along with this, datasets for hands-on learning were also provided for data analysis and learning.
Participants were required to make a submission for evaluation at the end of the course, to obtain the certificate. This course comprised hands-on data learning sessions and various expert sessions on data discourses. The course especially catered to data and policy enthusiasts – including students, professionals, researchers, and other individuals looking for a comprehensive foundation for data-based policy research. The training programme was conducted by an expert group of academicians which included Prof Nilanjan Banik, Professor and Program Director (BA, Economics and Finance) at Mahindra University, Hyderabad; Prof Utpal K. De, Professor, Department of Economics at North-Eastern Hill University (NEHU), Shillong and a Visiting Professor at IMPRI.
Others included Prof Vibhuti Patel, Visiting Professor, IMPRI and a Former Professor, Tata Institute of Social Sciences (TISS), Mumbai; Dr Soumyadip Chattopadhyay, Associate Professor, Economics at Visva-Bharati, Santiniketan and a Visiting Senior Fellow, IMPRI; Prof Nalin Bharti, Professor, Department of Humanities and Social Sciences, Indian Institute of Technology (IIT), Patna and a Visiting Senior Fellow, IMPRI; Prof Gummadi Sridevi, Professor, School of Economics, University of Hyderabad and Visiting Senior Fellow at IMPRI; Dr Amar Jesani, Independent Researcher and Teacher (Bioethics and Public Health) and Editor of Indian Journal of Medical Ethics; Dr Ismail Haque, Fellow, Indian Council for Research on International Economic Relations (ICRIER) and a Visiting Fellow at IMPRI; Dr Arjun Kumar Director, IMPRI.

Day 1 | October 15, 2022

The first day started with a brief introduction of the panellists. Post the introduction, Prof Nilanjan Banik took over and commenced the training programme. He started by introducing concepts of Distribution Functions, and Normal Distributions, with explanations about how to interpret normal distribution tables. He explained the underlying concepts of class intervals and population density functions among others. Prof Banik used a data set of property prices in order to provide a more hands-on example. This hands-on session was followed by an insightful session on “Research Ethics in Primary Data Collection and Analysis”.
The speaker for the session was Dr Amar Jesani, who delved into the importance of ethics in research, including the protection of participants, the relevance of Research Ethics Frameworks and Research Integrity (prevention of misconduct in research). He mentioned 8 critical benchmarks of ethical research: 1.) Social Value; 2.) Scientific Validity; 3.) Favourable Risk Benefit Ratio; 4.) Fair Selection of Study Population; 5.) Informed Consent; 6.) Redirect for Participants and Communities; 7.) Independent Review; 8.) Collaborative Partnerships. He then went on to individually explain these benchmarks.

Day 2 | October 22, 2022

The second day began with an introduction to Sampling and Analysis by Prof Utpal De. In order to understand these concepts better, he talked about the various steps required in conducting a sample survey. He also discussed the various types of sampling methods: Simple random sampling, Systematic Sampling, Stratified Random Sampling, Two-Stage Sampling, Cluster Sampling, Multi-Stage Sampling and the like. This was followed by a discussion on the Statistical Systems in India and an Introduction to Various Official and Other Databases, India’s Macroeconomic and Financial Data by Dr Arjun Kumar and Ms Anshula Mehta, Senior Assistant Director at IMPRI. They talked about various sources of official data and what sources one can use for research work. These included the Government System of Scrutiny, Annual Report of Ministry, Program data and dashboard, Press Information Bureau (PIB), Comptroller and Auditor General (CAG) Report, Parliamentary Committee Report and NITI Aayog. The issues and challenges with respect to the official data available were also discussed.
Problems of credibility, reliability and timeliness were discussed. Discrepancies in macroeconomic data, with respect to Gross Domestic Product (GDP), Banking and National Accounts pose another such challenge. As for Unemployment, Poverty and Inequality, methodological and estimation issues have been observed. Additionally, problems with the public availability of data have also been seen. This enriching discussion was followed by a session on Introduction to Geographic Information System (GIS) mapping and Visualisation of Geocoded data by Dr Ismail Haque. A GIS is a system designed to capture, store, manipulate, analyse, manage and present all types of geographically referenced or spatial data. It helps in mapping quantities, densities and trajectories. He explained the various types of GIS Data, predominantly the Vector Data Model and Raster Data Model. Dr Haque gave the participants a walkthrough of the QGIS software interface. He explained how to show the rental housing distribution on a map of Bangalore using various features of the software.

Day 3 | October 29, 2022

The third day started with Prof Banik delving into concepts of the standard normal distribution (Z Distribution), the T-tests, and the differences between these, again, with the use of data sets. He showed how to find out variance, mean, the P value, the test statistic and explained what the different components of the test statistic meant, such as the degrees of freedom, 2-tail and 1-tail tests, among others. After a comprehensive analysis of the data set, he moved on to provide some homework to the trainees. This was followed by an explanation of the concept of hypothesis testing which is an integral part of data analysis, and how the null hypothesis and alternate hypothesis will be different for two-tailed tests and one-tailed tests respectively. The P value, he explained, shows the probability that the null hypothesis is true. After explaining every new concept, Prof Banik ensured that the session was open for questions in order to clear doubts.
His session was followed by another session titled ‘Time Series Analysis: A Primer’, conducted by Prof Soumyadip Chattopadhyay, who tried to showcase how to run a regression analysis using time series data using a particular data set. Here he focused mainly on univariate time series models. He briefly explained the concept of regression analysis and then moved on to explain the stationarity of a time series model. Both theoretically and practically, he showed how stationarity influences a regression analysis, the concepts of weak stationary and non-stationary processes. After this, he explained the different tests for stationarity: the graphical approach, the Autocorrelation Function and Correlogram, the Unit Root Test and the Dickey-Fuller Test. He then explained what a Spurious Regression Problem is (when there is a high R2 value greater than the Durbin Watson D Statistic), and how such a regression problem does not have practical policy relevance. He showed the different ways to prevent the Spurious Regression Problem, such as Cointegration and the Engle-Granger Theorem. Using EViews, Prof Chattopadhyay then showed the trainees how to run the above-mentioned regression tests, explaining the various components present in them.

Day 4 | November 5, 2022

The fourth day commenced with Prof Banik discussing the homework given to the trainees the previous day. He then discussed how to frame the objectives and the claims before testing a hypothesis and the two ways to capture causality. He explained what it meant to standardise the data and how to do it using Excel. Carefully explaining the differences between correlation and regression, he noted the importance of distinguishing between the two during data analysis. The major difference, he showed, is in the concept of linearity. Using the data analysis tools in Excel, he individually explained the various concepts of the ANOVA table, Residual Output and the R square among others, based on a data set of property prices. The objective was to check if the various variables such as “Square Feet”, “Year Built”, “Lot Size” etc, influence the prices of the properties, that is, which variables are statistically significant. Following Prof Banik, a new session commenced under Prof Vibhuti Patel, titled “Feminist Principles in Scientific Inquiry”.
The session primarily catered to the need for being gender-responsive while conducting a scientific enquiry since the process itself impacts the result of the research. She started by outlining the broader question of feminist principles in research: the need to reduce interviewer bias, increase the response rate, create more interactive and empowering experiences, facilitate trust while conducting interviews on sensitive issues such as domestic violence, caste-based exclusion, discrimination, among others and the need of ensuring complete knowledge. Feminist perspectives have immense transformative potential in terms of driving social change, she mentioned. It can validate the experiences of women by mainstreaming the gender perspective. However, she mentioned how there needs to be the use of “conscious subjectivity”, that inherently takes into consideration the privileged ideal position of the researcher. As such, there needs to be a sense of solidarity towards the subject, reducing the distance between the researcher and the researched. She then talked about the relevance of the Feminist Standpoint Theory in scientific Inquiry: that one’s social location limits their base of knowledge.
Knowledge generated about women by “experts” cannot be in isolation, that is, without the active participation of women themselves. Such complete research reports include the Sharmashakti Report, Time Use Survey among others, she noted. The next part of her presentation was about power relations: how dominant ideology subjugates the knowledge of the marginalised. She noted that one of the leading concerns of feminist perspectives is to bring about knowledge of the marginalised ideology, which often gets encapsulated within the elitist androcentric intellectual legacy. Hence within feminist perspectives, there is a need to challenge existing power structures in order to empower the subaltern. Traditional Research Methods of the 1950s till the 1970s were gender-blind, and were guided by the Role Conflict Theory, often giving into patriarchal explanations. She traced back the roots of feminist perspectives in economic research and surveys in India, elaborating on how various paid and unpaid work came to be recognised over the years.
She reiterates the famous 1984 quote, “Women constitute ½ of the world’s population, do ⅔ of the world’s work and get 1/10 of the world’s income and own 1/100 of the world’s wealth.” Prof Patel emphasised the importance of qualitative techniques while encountering gender issues. Lastly, she explained the Grounded Theory, which refers to systematic inductive methods that use particular facts to form general rules and principles. These principles are used to conduct qualitative research that develops theory and explains the studied empirical phenomena. The Grounded Theory has become a game changer in conducting feminist surveys in India. She concluded by briefly summing up the significance of feminist principles and perspectives in research and scientific inquiry and noted that we still have a long way to go in this regard. With this, the sessions for day 4 ended.

Day 5 | November 12, 2022

The fifth day was proceeded by Prof Nilanjan Banik, who tried to emancipate the gathered audience over the materials to be discussed. He continued where he left off in the previous session. He explained the concept of errors and how it is very necessary to minimize errors when estimating values based on some variables. The errors from various deviations such as moments and skewness can be calculated by various methods. He then proceeded to explain an error from a linear regression function, by explaining various factors involved such as stochastic variables. The methodology used in the calculation was explained and how the interplay of various factors like beta and collinearity are relevant. In order to verify certain conditions, the Jarque Bera Statistics was briefly explained along with the general techniques that are used. Skewness was explained next with a few examples. A data set was taken to practically explain the whole process using variables that consisted of family income and family consumption.
The conditional mean was calculated using the data set, plotted on axes. It is very important to minimize the sum of squares of errors which can be calculated with the help of some formulas. Although these are mathematical, the results are calculated based on certain assumptions which were discussed. The various interpretation of different results was also discussed ahead. Next up, the data set shared previously is used to calculate the various results that have been discussed theoretically so far, such as t-test, error from the regression equation, ESS, RSS, TSS, and so on, over an excel sheet for practical understanding. These results were interpreted by Prof. Nilanjan for a better understanding of their significance. Next, the next session was preceded by Dr Vibhuti Patel, wherein the floor was made open for questions on the session which talked about Feminism in Data Analytics. She started off talking briefly about the ShramShakti Report. She then discussed the time-use survey introduced by the Government of India and the NSSO.
She further emphasized the involvement of feminist ideology in various fields such as sociology, psychology, and the very power structure in general. The predetermined bias and background can be influential in various decision-making scenarios. Various institutions should consider these aspects when looking into data structures. She further talked about her experience in the field of feminism and data analytics, participating and researching in various events and organizations. The session was then taken over by Dr Soumyadip Chattopadhyay. He began by explaining the concept of forecasting as a continuation of time series analysis, the method used in the process which is ARIMA. The AR model, its properties, and the result derived were explained using various assumptions and formulations for the calculation of ARIMA. Similarly, the MA model, its properties, and the result derived were explained keeping in mind the various assumptions and using specific formulations.
Further, the process and result of the ARMA model were explained. The PACF used in the AR and MA process was discussed along with the factors that lead to its requirement. He then emphasized the fact that it is very important to identify the type of regression function to select any particular method for forecasting, based on various characteristics. Next, the Box Jenkins method for forecasting was explained, which uses various models in its process. This was followed by the evaluation of forecasts, explaining the assumptions before calculations, and the conditions for selecting a particular model. Finally, the practicality of the theory being discussed was dealt with, with the help of EViews and importing the data set from excel.

Day 6 | November 19, 2022

Day 6 began with the session headed by Prof Nilanjan Banik. He started briefly explaining the materials of the previous session. He continued with the assumptions that are taken into account while finding errors. He proceeded with the data set that he left in the previous session, starting with the F-test and continuing further on the theories and deriving the results that were discussed in the previous sessions with the help of Excel. He then worked on a new data set to calculate multicollinearity with the help of EViews using regression and other statistical functions. He then proceeded with the calculation and explanation of Auto Correlation taking into account a data set. Continuing with the data set he then explained the concept of heteroskedasticity.
The session was then proceeded by Prof. Nalin Bharti. He began with the concept of different types of Tariff data. To ease the explanation, he took certain data sets from the Government of India websites on trade and statistics, comparing different commodities exported and imported. This was continued forward to the concept of Data Analysis of Import Export using SMART Model. It started with the explanation of WITTS Global Tariff Cuts and Trade Simulator and the modules under it, along with explaining the modules individually to show their applicability with examples for ease of understanding. The tariffs were also explained with the help of an interactive video which was a simulation tutorial, the data set from the simulation was imported into Excel for further analysis.
The session was then taken over by Prof. Gummadi Sridevi who presented on the topic, Ethics in Research. She started by emphasizing the fact that ethics in data collection is important to mitigate any bias that may arise, and be inclusive to all the groups concerned over the field of research. Primary data is more of an observatory phenomenon rather than being entirely subjective, showing the importance of being careful while collecting data to remove any sort of unwanted and unethical process. She then discussed the various field research methods. She proceeded to talk about the development of the topic of ethics in data and research, the involvement of various institutes and organizations, and her personal experiences in the particular field, combined with examples for better understanding.
Various experiments related to the study of ethics in the research were also taken into the discussion to portray results and impute their significance in modern research and data collection, focussing on proper information dispatch. The importance of the context of research is also a grey area when we talk about ethics, defining the roles of concerned parties, and the purpose to remove any sort of discrepancy and bias. There should be significant clarity of the personnel and institution involved in the field of particular research, the responsibilities of a researcher, and the proof of data sets. A suitable example to explain the concept of ethics was explained, taking into account a study conducted in a distant village in the country.
Acknowledgement: Aaswash Mahanta, Soham Biswas and Tripta Behara, research interns at IMPRI



Abrogation of Art 370: Increasing alienation, relentless repression, simmering conflict

One year after the abrogation by the Central Government of Art. 370 in Kashmir, what is the situation in the Valley. Have the promises of peace, normalcy and development been realised? What is the current status in the Valley? Here is a detailed note by the People’s Union for Civil Liberties , “Jammu & Kashmir: One Year after Abrogation of Art. 370: Increasing Alienation, Relentless Repression, Simmering Conflict”:

Release of dabang neta: Rule of law can't be allowed to be slave to political rhetoric

By Vidya Bhushan Rawat*  When we look to politicians for solutions and politics as the 'final solution' for every evil then we are disappointed most of the time. In politics, we knowingly or unknowingly become part of the propaganda tool of the ruling elite which exists everywhere across different castes. We often provide issues and talk about them in binaries which suit our elites. The minorities among the marginalised who have no political space and representation rarely get heard by these majoritarian parties whose agenda remain power communities. Every political party in today's time is following the 'successful' formula of 'democracy' which is keeping the 'powerful' 'jaatis' with them leaving aside the marginalised one. The BJP started this but yes they cobbled together all other communities too through a diverse narrative.

Why are 17 Indian cos, including Sterlite, blacklisted by Norway bank

By Venkatesh Nayak* Readers may recall the gory incidents that took place at Thoothukudi (Tuticorin) in Tamil Nadu in the southern part of India on 22 May, 2018. Thirteen protesters died on the spot when the police opened fire to disperse an assemblage of thousands of local residents and representatives of civil society groups. They were protesting against the adverse environmental impact of the industrial operations of Sterlite Copper which runs a copper smelter plant in the area. Accusations against the company have ranged from polluting local water resources to plans for expanding the installed capacity of the plant without the necessary environmental clearances. A ground report published in The Wire recently, mentions the decision taken by Norges Bank a few years ago to not invest funds from Norway’s Government Pension Fund Global (GPFG) in Sterlite “due to an unacceptable risk of complicity in current and future severe environmental damage and systematic human rights violations

India’s macroeconomic resilience amidst global fragility: facts, factors, forecasts

By IMPRI Team  Under the series, The State of the Economy – #EconDialogue , Center for the Study of Finance and Economics (CSFE), IMPRI Impact and Policy Research Institute, New Delhi organized #WebPolicyTalk, a distinguished lecture on the topic India’s Macroeconomic Resilience amidst Global Fragility: Facts, Factors and Forecasts, by Dr Deepak Mishra. Dr Deepak Mishra is the Director and Chief Executive of the Indian Council for Research on International Economic Relations (ICRIER) , New Delhi. The session was chaired by Dr Rafiq Dossani, Director, RAND Center for Asia Pacific Policy , a Senior Economist and the Professor of Policy Analysis, Pardee RAND Graduate School . The discussants of the event were Prof Nilanjan Banik, Professor and Program Director (BA, Economics and Finance), Mahindra University, Hyderabad , Dr Pooja Misra, Associate Professor and Area Head, Economics, Birla Institute of Management Technology, Greater Noida and Mr Arvind Chari, Chief Investment Officer, Qua

Upcoming monsoon: No lessons learned from past flooding, waterlogging of Vadodara

Letter to MoEF&CC, Vadodara Municipal Corporation, Vadodara Urban Development Authority, Collector Vadodara, Chief Secretary, Gujarat, GPCB, CPCB and Others by Concerned Citizens of Vadodara*: *** Let us take into account the uncertainty of weather and climatic conditions and the prevalent erratic rainfall. Let us hope that we remember past floods and waterlogging and have learnt lessons from those disasters and tragedies. So, let us act immediately before the upcoming monsoon of 2023. It is apparent that, practically, no lessons have been learned from the past flooding and waterlogging events and from the ill-advised and ill-conceived rejuvenation efforts of Vishwamitri River. No action has been taken yet by the Vadodara Municipal Corporation in terms of identifying and removal of the debris from the ravines, water ways, lakes, ponds, wetlands, and low-lying areas. Instead, dumping of debris and other wastes continues into the river environs. Even clear directions given by the con

Why was this BJP leader forced to call off marriage of his daughter with Muslim boy?

By Vidya Bhushan Rawat*  A marriage of two individuals belonging to different faiths was ultimately postponed as the 'champions' of the social morality dominated the discourse and threatened the father of the girl who happened to be the chairman of Pauri city municipality. Yashpal Benam, a BJP leader, posted the invitation of his daughter's wedding with a Muslim boy from Uttar Pradesh. Both the boy and the girl became friend during their B Tech course and were in relationship. There were reports that they already got married in the court but we don't know the reality. Perhaps the family of the girl wanted to send a message of 'acceptability' and 'appreciation' of such a marriage by the society.

Against genuine pace, spin Rohan Kanhai was best player, on par with Viv Richards

By Harsh Thakor  Rohan Kanhai took creative genius in batting or aesthecism to regions unexplored. He virtually gave the art of batting a new dimension, being the equivalent of a Beethoven or Rembrandt to batting. He in full flow was manifestation of a divine energy. He could literally invent strokes of his own. He could eviserate any bowling attack, in any conditions.

Adherent of Charu Mazumdar who failed to confront policy of annihilation of class enemy

By Harsh Thakor  Communist Revolutionary leader Chandi Sarkar expired at 76 years old at his home in Krishnanagar of Nadia on, 5th April, at 11 pm. He has carved a permanent niche amongst the great Communist Revolutionary leaders of India. Till his last breath he blazed the spirit of revolution. Few leaders in West Bengal, more ressurected spirit of Naxalbari. Only with characters like Sarkar, can the Indian revolution ever advance. With unflinching resilience he defended Mao Tse Tung Thought and concept of peoples War. Chandi Sarkar was born on 15 August 1947 in a landlord family of Maharajpur village of Chandra Police Station of Nadia district. His father’s name was Ashok Sarkar. He was an accomplished sportsman since childhood, being given a trainee job as a for hockey player.

In terms of sheer statistics Sydney Barnes was indisputably the best of all bowlers

By Harsh Thakor  Late Sydney Barnes just reached the milestone of 150. Born at Smethwick, Staffordshire, April 19, 1873. Died at Chadsmoor, Staffordshire, and December 26, 1967. Sydney Francis Barnes was the second son of five children of Richard Barnes who spent nearly all his life in Staffordshire and worked for a Birmingham firm for 63 years. The father played only a little cricket and Sydney Barnes pledged that he never had more than three hours' coaching. Billy Ward of Warwickshire gave him the tutelage in his cultivating the off break from which he developed a leg break.Barnes was a gaunt faced man with wide eyes and an austere expression. Action and Style With a bouncy run up his long strong fingers could spin, swerve and seam a cricket ball in the air at medium pace, a but a stock speed well above medium..He bowled with his middle finger over the seam with the first and third spread on either side. His full circular swing enabled him to produce a smooth, coordinated delive

Killing of Atiq raises questions regarding lapses in cops' professional competence

By Prof Sudhanshu Tripathi*  What next or who next? The ongoing narrative in popular media over the slain Atiq Ahmed began with CM Yogi’s oft-cited speech in the state assembly that “iss mafia ko mitti mein mila denge.” And consequent encounters of four shooters by the UP police involved in the cold blooded murder of Umesh Pal -- the only witness alive in Raju Pal’s broad daylight murder by Atiq and his henchmen -- in February 2023. Further, few more encounters by the UP police since then have boldly underlined the oft-cited zero-tolerance policy of the Yogi government.