The United Nations Development Programme's Vulnerability Projects: Roma and Ethnic Data

Susanne Milcher and Andrey Ivanov1


Romani ethnicity is a high risk factor in Central and Eastern Europe. Although Roma in Western Europe are also faced with serious problems, the scope and depth of the problem is much greater in the former Communist countries – as was demonstrated by the unrest in Eastern Slovakia's Romani communities in early 2004.2

Although official poverty data disaggregated by ethnic status is limited, survey evidence for Bulgaria, the Czech Republic, Hungary, Romania and Slovakia, confirms that poverty rates for Roma exceed by far those of the overall population. In Hungary, Roma are approximately eight times more likely to experience long-term unemployment than the general population. Unemployment among the Roma substantially exceeds average non-Roma unemployment rates. In Slovakia, while Roma comprise 5 percent of those unemployed for up to six months, they represent as much as 52 percent of those unemployed for more than four years. Romani ethnicity in these countries brings the risk of permanent labour market exclusion.3

Virtually all basic social indicators are worse for Roma compared with other ethnic groups in Central and Eastern Europe. Low levels of education, lack of access to health care, poor housing conditions, high unemployment and discrimination contribute to their low social status. In addition, Roma live predominantly in disadvantaged regions, where they face a deteriorating socio-economic situation – such as weak employment opportunities – disproportionately. In the case of the new EU member states, the most economically depressed regions are likely to turn into "Roma-dominated regions of the EU"' when the current national borders lose their significance. In the Balkan countries, Roma often make up a large part of refugees or internally displaced persons (IDPs), and this makes them more vulnerable in terms of income, access to health and education. Targeted social inclusion policies and reforms are required, and monitoring of vulnerable groups is a major prerequisite for any future policy measures. In order to identify vulnerable groups at risk of poverty and social exclusion, quantitative data disaggregated by ethnicity and other socio-economic characteristics is necessary. However, monitoring the socio-economic situation of the population by ethnicity is a challenging task. First, in some cases, such as in France, ethnic monitoring is prohibited by the Criminal Code. In many countries, however, data protection laws are largely misinterpreted to prohibit any kind of ethnic data collection, while this legislation only requires certain guarantees regarding the processing of ethnic data. Second, members of ethnic minorities might not be willing to self-identify as such because of fear of discriminatory practices.

Addressing a number of methodological problems, as well as a data deficit, are the main challenges confronting the United Nations Development Programme's work in this area.4 This article outlines the UNDP's previous experience with collecting ethnic data, the challenges encountered and the solutions to them. Subsequently, it presents the UNDP's planned initiatives in this area, which are also within the framework of the Decade of Roma Inclusion.

UNDP's Experience with Ethnic Data Collection

Household surveys and censuses often significantly underestimate the Romani population. In censuses, Roma often opt not to self-identify, for fear of discrimination. National representative survey samples are usually based on census data. Roma who did not identify as Roma in the census are therefore likely to be undersampled.

Data on household incomes and expenditures disaggregated by ethnicity is scarce. For many reasons, statistical institutes do not monitor household budgets by ethnicity. In the Roma context, this reflects both political sensitivity and resistance from Romani organisations. The latter have (not wholly unreasonable) concerns that ethnically disaggregated data could be used for discriminatory purposes (in access to jobs or active labour market policies for the unemployed).

And here both researchers and policy-makers face a peculiar vicious circle: Data is necessary but not available. When available, it is not reliable (different estimations of Roma can be equally acceptable and justified using different sets of arguments). As a result, the opportunity for data misinterpretation is disturbingly broad: Depending on whether higher or lower estimates "work" better in the particular political context, different actors can argue for or against some current political issue using data that does not accurately describe the situation of Roma.

Roma in Central and Eastern Europe

Filling these data holes (at least in part) was one of the objectives of the regional UNDP/ILO large scale survey on Roma in five Central and Eastern European (CEE) countries conducted in 2001. The survey looked at the situation of Roma from a "human development" perspective. With the ultimate goal of expanding people's choices, human development looks at areas of health, education and living standards. In terms of living standards, Romani respondents were asked to assess their household incomes, main income sources, total expenditures, and expenditures by main product and service groups. The results do not just show that Roma are among the poorest of the poor in Central and Eastern Europe (this is an evident fact). What is more important, they outline how much worse the situation of Roma is, and what the specific characteristics of their status are (for example, what are the income sources or the causes of unemployment). Answering these specific and concrete questions in quantitative figures is a necessary precondition, both for understanding the underlying causes and addressing them adequately.

The survey data collected from face-to-face interviews with 5,034 Roma respondents in Bulgaria, the Czech Republic, Hungary, Romania and Slovakia was analysed in the UNDP Regional Human Development report "Avoiding the Dependency Trap," which was published in 2002.5 The results from each country are comparable because they are based on a common questionnaire, translated into the respective local languages, and on an identical sampling design methodology. The sample size in each country was close to 1,000, making the survey fairly representative of the Romani population in each country.


1. Deciding on the target population (Roma, IDPs, refugees)
2. Sampling frame. This is the list of units from which the sample is selected, such as population lists, files or register forms. This frame is necessary so that any part of the population has a chance of being included in the sample. The major problem here is related to noncoverage: Where ethnicity is not included in the census question, population lists would not be available for Roma. Another problem could arise from the clusters of elements: In one dwelling there can be more than one household.
3. Sampling design.
* Simple random sampling: This is a method of selecting households from the sampling frame with random numbers. It requires a complete list of the total population. Each household of the population has an equal chance of being chosen.
* Stratified random sampling: According to this method, the target population is divided into non-overlapping groups (strata) that differ in characteristics, such as gender, age, ethnicity or geographical location. Within each stratum, samples are drawn randomly. The advantage of this method is that it also represents subgroups of the total population, such as minorities.
Stratified random sampling may have more statistical precision than simple random sampling when the groups are homogenous in terms of the targeted variable (for example, income, education or health status), because variability is expected to be lower in homogenous groups than in the overall population. In the case of Roma, using stratified random sampling is helpful because Roma are very homogenous in terms of their socio-economic situation.
Identification of regions/villages with high percentage of Roma, IDPs or refugees might be difficult due to lack of quantitative data and because ethnicity is not reported in the census. In addition, Roma may also belong to the group of refugees, a situation that makes the sampling of three separate groups almost impossible.
4. Questionnaire and fieldwork. The problems here are related primarily to non-sampling errors that could occur due to mistakes of the interviewer or unwillingness of Roma to respond. Therefore, participation of people from the communities in the field-work is particularly important.

The data set contains over 400 variables, which are mostly qualitative. Overall, 100 questions were asked, taking into account equally the individual and household level. Half of the questions in the survey,were "individually-oriented" and the rest were "household-oriented". Individuals, not households, were interviewed, but some of the questions concerned the respondent's household. Data based on the household rather than on the individual as the unit of observation is particularly important in order to calculate poverty rates and to perform quantitative poverty analysis.

The UNDP survey used stratified random sampling for Roma older than 18. Sampling was based on data provided by the last census in each country. The totalnumber of Roma in the census data is most probably inaccurate due to (i) the time lapse since the last census and (ii) under-representation of Roma because of deficient self-identification. However, although in all countries the numbers of people identifzing themselves as Romani are substantiallz below the actual Romani population numbers, it was assumed that the census results adequately reflect Romani population structures in terms of rural/urban, age and sex distributions. The quotas for neighbourhoods and villages populated mainly by Roma were identified on the basis of census data about the territorial distribution of the population. Households were picked randomly within each sample cluster. In the case of Roma, complete sampling frames, such as lists, registered addresses or files are usually not available. Therefore, sampling households follows a systematic technique, such as picking each third house on the left side of the street within each sample cluster.

The exact profile of the respondent to be interviewed was determined for each sampling cluster. Field operators identified the individuals to be interviewed, corresponding to the profile of respondents for the cluster, with the assistance of local government administrators and social assistance services.

Regarding ethnic affiliation, the research team followed the Framework Convention for the Protection of National Minorities, which combines subjective self-identification with culturally-based objective criteria.6 Ethnicity was identified through a number of different questions, including self-identification ("Do you feel Romani?"), interviewer identification, language and ethnicity of the majority of children in school. Only 9 percent of the respondents identified by field operators, local administration and Romani NGOs as being Romani did not consider themselves to be Roma. The responses of those who did not identify themselves as Roma implicitly suggest that most of them are of Romani ethnic background but for various reasons prefer not to reveal it. These respondents, however, share socioeconomic characteristics and cultural patterns with their Romani neighbours. Thus, 13 percent of the respondents who stated non-Romani affiliation answered that the ethnic majority in the school which their children attended was Romani. The same answer was given by 19 percent of the respondents in the overall sample. Further, 32 percent of the respondents who declared non-Romani affiliation use the Romani language at home. In the overall sample 54 percent of the respondents stated they use Romani language at home (see Table 1, below).

Table 1
Identification of Roma
(Percentage of overall sample)

Self- Identification                                                                           91
Language                                                                                       54
Romani ethnicity declared in last census                                        48
Romani ethnic majority in school                                                    19

Source: UNDP/ILO survey 2001

The UNDP dataset represents a valuable input for the analysis of Roma. The scope of the data allows for a comprehensive analysis of poverty among Roma. However, several shortcomings are visible. The survey cannot claim complete statistical representativeness as the question "who is Romani?" cannot be answered precisely. Due to such conceptual deficits, the size of the Romani population in each country cannot be set at a precise figure either. Also, the questionnaire was not designed to capture comprehensive household profiles on expenditures, education, health and employment. Even a perfectly designed sample is likely to over-represent the worst-off segments of the Romani population, since they are recognisably Romani and most unlikely to be integrated into majority communities. Furthermore, the missing sample of the majority population as a control group is also a shortcoming. Some of these shortcomings were taken into account by the UNDP when designing the survey of Roma in Montenegro.

Household Survey of Roma, Refugees and Internally Displaced Persons in Montenegro

The Household Survey of Roma, Refugees and Internally Displaced Persons, undertaken in 2003 on the initiative of the UNDP Montenegro, is a rich comparative analysis of the situation of vulnerable groups in Montenegro.7 The sample included four sub-samples: Roma, refugees, IDPs, and the majority population as a control group. For the data collection among all four sub-samples, a uniform questionnaire was used with the same basic objective: to better understand poverty and vulnerability among marginalised populations in Montenegro and to help develop baselines for regular monitoring. In addition to a household roster, housing conditions,

durable assets, food and non-food consumption, employment and personal and individual income, the questionnaire also included questions on citizenship status, real estate in country of origin, plans for repatriation, family planning, etc. In total, the questions were to a great extent based on households, which provided the opportunity to calculate the poverty head count for Roma, IDPs and refugees separately, as well as other socio-economic indicators for the three groups.

The samples have been designed based on the data about refugees and IDPs received from the Commissariat for Displaced Persons and UNHCR. After territorial distribution and after municipalities were identified, households were randomly selected. The survey was conducted by the Institute for Strategic Studies and Prognoses (ISSP). In addition, based on the UNDP recommendation, an informal network of Romani non-governmental organizations "Romski krug" was engaged to complement expert data. Based on data received from this network, the sample for the Romani population was created.

The major problem that arises with this type of survey is that Roma can fall into more than one category (refugees, IDPs). The overlap of these three groups makes stratified sampling more difficult and comparison between the three groups may lead to biased results. Possible solutions to overcome this challenge will be discussed below. With concern for the cultural sensitivity of the Roma, Romani surveyors were given special training sessions, in addition to the regular training delivered for ISSP interviewers. For data collection, direct face-to-face interviews were conducted. The final form of the questionnaire resulted from previous ISSP experience, cooperation with the World Bank experts, comments of the UNDP office in Podgorica, as well as other UN Programmes (like UNHCR, UNICEF, etc.), and the Romani NGO network.

UNDP's Contribution to the Decade of Roma Inclusion

The Decade of Roma Inclusion corresponds to the Millennium Development Goals (MDGs) for the most vulnerable group in Europe – the Roma.8 The UNDP has consistently called for MDG disaggregation, so that the concerns of those most in need are reflected. Without data, however, MDGs – as well as implementation of sustainable policies for improvement of the situation of vulnerable groups – are empty slogans. Only based on quantitative data can the actors involved (governments, donors, implementing partners) outline priorities and measure progress. Disaggregated quantitative data is a precondition for relevant nationallevel policies for sustainable inclusion of vulnerable groups, and Roma in particular. This is the reason why the UNDP sees the elaboration of consistent and comparable quantitative socio-economic data disaggregated for major vulnerable groups as a precondition for sustainable improvement of these populations' situation and for the success of initiatives such as the Decade of Roma Inclusion.

South-Eastern Europe (SEE) Vulnerability Survey 9

Based on the UNDP's previous experience with ethnic data collection, as outlined above, its main contribution to the Decade of Roma Inclusion is a baseline household survey, representative for Roma, IDPs and refugees. In this particular case, however, the problems faced in the first survey on Roma in CEE have been given particular attention and the methodology has been substantially improved. First, the survey will have a majority population sample as a control group in all the countries. It will not just provide comparative data for vulnerable groups and the majority but will provide an opportunity for a more sophisticated analysis such as correlation analysis using various data sources (like the Household Budget Survey). Second, the questionnaire design allows maximum comparability with the existing datasets on vulnerable groups, in particular the UNDP dataset on Roma in Central and Eastern Europe. However, more emphasis will be given to household questions on consumption, living standards, employment, education and health, in order to calculate household based poverty, deprivation and unemployment rates for the different groups.

The sample design will be similar to the one used for designing the survey in Central and Eastern Europe. However, certain lessons will be taken into account to ensure even better data reliability and quality. The samples should be representative for sex, age and rural/urban distribution of Roma populations, major refugee groups and IDPs communities living in the respective countries. The inclusion of the majority population in the survey will allow comparing the living conditions of the Roma with those of the majority population. Given these requirements, an increased sample is necessary to ensure representativeness. On average (with minor variations from country to country), the samples will include 700 households (Roma, refugees, IDPs and majority), which would not be sufficient for claiming complete statistical relevance (as in a microcensus for example) but would be sufficient for sociological representativity. Given the fact that the status of all members of the household (demographic, educational, employment, etc.) will be recorded, the total number of individuals covered by the survey will range between 2,000 and 5,000, depending on the specific group. Since Roma in many countries are marginalised and vulnerable according to various criteria (for example, being Roma and refugees at the same time), some of the Roma households will appear in two samples, that for the "Roma" subgroup and that for the "refugees" subgroup. Sampling will be based on the available official data, from national statistical offices or other official institutions, adjusted by experts' estimations of population distribution and by information about unregistered migration provided by international humanitarian organisations and local NGOs dealing with ethnic or vulnerability issues. Households will be randomly chosen and all household members that are present can respond to the household questions, so the survey is one step closer to a "census-type" exercise. The sample design and the fieldwork will be conducted with the active participation of NGOs dealing with issues of vulnerability and experts from international organisations (such as the World Bank and OSI).

The major challenge in this survey is the overlap of populations in the three groups. Roma can be in more than one category and will represent not only Roma but also refugees and/or IDPs. Therefore, taking three separate samples will be almost impossible. One solution could be post-stratification. This method is a sensible alternative when it is not known to what stratum the individual population elements belong. First, two separate samples will be drawn. The first sample is representative of the Romani population and the second sample represents the majority population. After the samples have been drawn, in the phase of data analysis, the households will be classified according to the strata (IDPs, refugees) to which they belong. The data will reveal into which category the households will fall and in this way create a virtual sample of refugees and IDPs. Post-stratification is especially useful where responses are not available. By dividing the households into strata after the samples have been drawn, correlations between non-responses and the target variable (e.g. the educational attainment of Roma refugees) within one stratum can be taken into account and the population estimator will be more precise. Whereas, using simple random sampling, a high percentage of non-responses can lead to large distortions in precision and biased results. In addition to new sampling techniques, emphasis should be placed on improving the fieldwork.

One of the major prerequisites for relevant data, as the experience with the Montenegro survey has proven, is participation and involvement of the communities surveyed in the process. The issue is particularly relevant for Roma who often feel isolated from the state – and any structures perceived as "alien" to the community. Due to high levels of distrust, without explicit efforts in this area, figures obtained during the survey may not correspond to reality. This is the reason why Roma participation in the survey will be mainstreamed and consistently sought. After the sample model is ready and the sampling clusters are identified, young Romani individuals will be identified from each cluster with the assistance of Romani NGOs. They will be trained in the basics of sociological data collection, interviewing techniques and the contents and context of individual questions. The questionnaire will also be translated into Romani language. When the fieldwork per se takes place, each interviewer will be accompanied by an "assistant interviewer" from the surveyed community.

The role envisaged for the "assistant interviewers" is much broader than community penetration. These people could constitute the core of the future Roma data collectors, who could actively cooperate with the national statistical institutes and other bodies interested in collecting adequate data on the socioeconomic status of marginalised groups. This is a long-term investment that goes far beyond the validity of the results of this particular survey.

Experts' Group on Data and Measurements

As a second initiative within the framework of the Decade of Roma Inclusion, the UNDP coordinates the experts' group on data and measurement. Apart from consulting and methodologically supporting data collection in individual countries involved in the Decade of Roma inclusion, the experts' group will work on scaling up the experience generated within the survey on vulnerable groups, elaborating reliable methodologies applicable to specific countries' contexts. They are expected to suggest specific (feasible) ways of overcoming existing barriers in the area of ethnically disaggregated data collection (in all areas – capacity, legislation and political commitment). The objective of this initiative is to develop capacity for collection of disaggregated data at country level. By 2006–2007, the whole responsibility for the data collection should be transferred to the relevant bodies in the individual countries. Ideally, the group should consist of one member from the national statistical office, one member from the respective government body dealing with minority issues and members of Romani NGOs.


Ethnic data collection, particularly sampling ethnic minorities, is far from being an easy undertaking. The UNDP has done tremendous work in this area, has learned from previous experience and is currently working on finding new methods to overcome these challenges.

Some major lessons learned are: First, with regard to sampling design, it is necessary to apply Household Budget Survey and Labour Force Survey type of methodologies. Second, sampling cannot be based on the official numbers of Roma registered by the censuses alone. Census data, however, gives a good idea of the structure and territorial distribution of Roma. It is possible to complement this data with expert estimates (taking into consideration particularly data from Romani NGOs) of the ethnic background of the population in certain areas or settlements. This could be sufficient for constructing adequate and representative samples.

Another crucial element in surveys is fieldwork. Even perfect samples would not do much unless there is sufficient work with communities and there is trust on the side of the respondents. Avoiding the mistrust among Roma is possible if Roma themselves are involved in the fieldwork. The UNDP is actively cooperating with several Romani and non-Romani organisations active in the area of Roma development on a range of issues, starting with the design of the questionnaire and extending the establishment of a network of contacts with Romani organisations that could be involved in the fieldwork of the survey, and later on, in the data analysis.

However, several challenges remain: First, if census data does not reveal ethnicity because of legal restrictions or unwillingness to self-identify, a precise sampling frame cannot be made and selection of the sample will be inaccurate. Therefore, wrong estimations can be made if one cluster of Roma is not represented in the sample. Second, even if legal constraints are absent, how can abuse of the collected data be prevented? Third, how can subgroups that overlap (IDPs, refugees) be sampled? Poststratification seems to be a sensible solution, but the stratum that will be revealed from the data analysis might be small and is then not necessarily representative. The accuracy of this method also depends on the willingness of individuals to give correct answers to the questions that will categorize the strata. Also, self-identification and external identification might diverge considerably. To conclude, the UNDP has not overcome all challenges in terms of sampling vulnerable groups yet, but it is pushing governments and other agencies in Central and Eastern Europe to elaborate more on these issues.


  1. Susanne Milcher is Research Associate at UNDP Regional Centre in Bratislava. She has focused on issues of Romani education, sampling methodologies and poverty analysis. Andrey Ivanov is Human Development Adviser at the UNDP Regional Centre in Bratislava. His area of specialisation is poverty measurements and vulnerability analysis.
  2. For details of unrest among Roma in Slovakia in early 2004, please see:
  3. UNDP, ‘The Roma in Central and Eastern Europe: Avoiding the Dependency Trap’, Regional Human Development Report, UNDP Bratislava 2002.
  4. UNDP refers to the UNDP Regional Centre in Bratislava, if not otherwise stated.
  5. Complete data sets and the regional and national Roma reports are available at
  6. Council of Europe, ‘Framework Convention for the Protection of National Minorities’, Explanatory Report, 1995.
  7. UNDP Montenegro,, 2004.
  8. World Bank, ‘The Decade of Roma Inclusion’,, 2004.
  9. The survey covers Albania, Bosnia and Herzegovina, Bulgaria, Croatia, Macedonia, Serbia and Montenegro (with Serbia, Montenegro and Kosovo treated as separate entities) and Romania.


donate now

Challenge discrimination, promote equality

be informed

Receive our public announcements Receive our Roma Rights Journal

news portal

The latest Roma Rights news and content online

join us

Become a part of the ERRC's activist network in Europe