An Introduction to Internet Mailinglist Research

H. Gössmann, A. Mrugalla (Hrsg.), 11. Deutschsprachiger Japanologentag in Trier 1999, Band 2: Sprache, Literatur, Kunst, Populärkultur/Medien, Informationstechnik,
Münster/Hamburg/London: LIT Verlag, 2001: 653-665


This paper is intended as a practice oriented introduction to Internet mailinglists as research material for German researchers in Japanese Studies who have little previous experience with online material.
After some general remarks about mailinglists I mention new opportunities as well as limitations when dealing with mailinglists as research sources and present some current approaches to mailinglist research, with a focus on examples from social and culture studies. Following is a short analysis of what I think should be pursued further and a brief introduction of my own project that tries to fill a gap I perceive between two kinds of studies so far. I end with some recommendations of resources that can be useful for a Japan related mailinglist project.
1. Some general considerations on mailinglists as research material

1.1 What are Internet mailinglists and for whom are they useful?

Mailinglists as regarded in this paper are Internet based electronic discussion groups (as opposed to one-directional distribution lists). On the server side they are administered by a list program (e.g. listserv, listproc, majordomo, mailbase, lyris, mailman), for the participants they are accessible via simple electronic mail.
In many respects mailinglists resemble (Usenet) newsgroups, bulletin board systems (BBSs), forums in online services or mailbox-nets, but in practice the different systems are often used by different people and for different purposes (cf. Döring 1999:35f).
With list archives made accessible via the WWW and webboards having similar functions, in some areas recently a merging of list and web use can be observed.
Although the principle of electronic conferencing dates back more than 20 years to the early days of computer mediated communication (CMC), with the world wide spread of the Internet nowadays a more diverse use and international participation leads to new potentials for information and communication as well as for research.

Because of the low bandwidth needed, mailinglists are especially useful for people in less connected areas. They are an important means of communication for distributed interest groups, for pioneers or people with special needs or interests who do not find like-minded people or support locally. They can be particularely useful for academic and learning activities.

1.2 What is potentially interesting about mailinglists for researchers in the social sciences, culture studies and humanities?

Mailinglist vary greatly in focus and style. Lists can resemble notice boards, news tickers, talkshows or self-help groups. Features like the quality of information contributed, the style of interaction, discussion or moderation differ considerably from list to list. Accordingly lists can be examined in a variety of ways. Here I would like to distinguish roughly between three types of motivations for the study of mailinglists:

1.2.1 A vast amount of new material for old questions

Because the topics under discussion on mailinglists have such a broad range, many researchers will find them interesting as easily available material for their existing research questions.
Topics studied might be e.g. the respective content of discussions (be it child rearing, stock rates, or urban legends), aspects of human communication, social structures and interaction patterns, language development, group psychology etc.

1.2.2 Interest in the Internet as such - and mailinglists as one representation

Topics of interest here include Internet culture, virtual communities, cyber-democracy (new chances for equality, participation und democracy), new forms of learning and research through global networking, or legal and privacy issues in networked media.

1.2.3 A special interest in asynchronous CMC, namely mailinglists or newsgroups

Examples here are topics and strategies of discussion in specific groups, social networks as seen through communication relations, the role of "lurkers", levels of participation, learning and diffusion processes (intra- and inter-list), distinct mailinglist cultures (social relations, levels of speech, rules,...), or the embeddedness of list cultures in offline academic cultures.

1.3 New opportunities for research

Compared to the study of "offline" material, the social researcher finds a number of new opportunities when dealing with mailinglists.

1.3.1 Characteristics of the material

Written group discussions which in their origin and process are not influenced by the observer and directly availabe in electronic form are a new type of material for researchers who were previously confined to secondary material like field notes, interviews or questionnaires, or who had to record and transcribe statements or discussions (with the recording mostly being obvious). In many cases on mailinglists we find relatively authentic records of contemporary human communication. There is a broad topic spectrum, participants come from all over the world, and huge amounts of data are available (cf. Rafaeli/Sudweeks 1998:174 for group CMC characteristics).

1.3.2 Form of observation/data collection

It takes only little effort to capture the communication of entire lists, e.g. by retrieving their archives. If the researcher is subscribed to a list, but does not contribute actively, quasi participant observation with hardly any "visibility" is possible. Usually only a very short subscription period of some seconds is necessary to retrieve archives from a listserver, and this can be done at a later point in time as well.

1.3.3 Further computer supported processing

Throughout most stages of a mailinglist research project computers can be of great help:
As stated above, this starts with convenient (automatic) data gathering, following are steps of data preparation, e.g. incorporating, formatting, cutting, rearranging the material. To some extent automatic classification can be achieved making use of standardized mail formats (e.g. header-fields like author, date, subject; citations; signatures). Like with any electronic material, indexing and keyword/string search supports the handling of bigger corpora. Content anaylsis can be supported through automatic markup of findings, functions for memos or annotations. In the case of quantitative analysis data can be directly passed on to statistics software. Graph layout software can help to visualize communication structures or semantic relations.

1.4 Limitations

When dealing with mailinglists it is also important to recognize what we do not know about the observed people and their communication.
Researchers who are interested in representative studies face the problem that the Internet user population is still not representative for the population at large, so generalisability can only be reached with respect to certain user groups (or an additional "offline" effort has to be made).
What we se on a list, may not be the whole picture necessary for a sufficient understanding of what is going on. E.g. there may well be additional private communication in the background of a list that also influences list discussions, but can not be observed by a list member. This is often the case with answers to questions sent via personal mail, so we cannot judge whether an information need has been satisfied through list members, unless there appears a clarifying statement on the list.
In most cases the majority of subscribed members remains passiv (engl: "lurker", jap: "ROM" ("read only member")), so except for mail addresses we often do not know anything about the biggest part of such a "group". Do they read messages at all, do they just log or archive, is the mail account abandoned?
For the interpretation of certain behaviour "in real life" we are used to take into account additional visible information, like non-verbal indications for moods or intentions, gender, age, physical condition or status of the people discussing. Nevertheless in a text based online context such social cues about the participants are either absent or at least we cannot be sure. Arbitrary construction of virtual persons is easily possible.
We cannot even assume that one mail address means one person. Behind one e-mail address there may well be several people, other lists or software agents. E.g. "Tanaka Tomoyuki" is a wellknown figure in newsgroups like soc.culture.japan, but there have been a lot of speculations about his actual identity.
If a poster really intends to hide his identity, clues about the origin of an e-mail can almost completely be removed by using an anonymous remailer.
In the case of a research project that includes automatic counting of postings, threads, or communication relations, spam (i.e. unsolicited commercial e-mail), off-topic postings and other "non-contributions" may distort the picture.

1.5 Increased need for privacy considerations

On many mailinglists subscription and logging of all communication is possible for anyone, but participants may not be aware of this fact. E.g. a study about potentially embarrassing communication in newsgroups showed a surprisingly low level of risk perception (Witmer 1998:140).
In addition, personal information gained through mailinglist observation can be combined with other observations of online behaviour, because Internet users leave traces and personal information in all sorts of places: through e-mail, on news, lists, websites etc. In summary, putting together several information sources, user profiles of great detail can be generated and the possibilities for misuse have increased enormously compared to "offline" research, so the legitimacy of online research activities has to be questionned in every case. Like with other types of observation there is also the danger of destroying one's subject by "tearing it into the light" through research (Smith 1999:211f).

Special laws and codices for the privacy protection of online users are only beginning to emerge. As a non law expert on the legal side I would only like to mention the general federal German law on privacy (Bundesdatenschutzgesetz) of 1990 (in particular 28) and the newly created special law on rights and duties of online service providers (Gesetz über den Datenschutz bei Telediensten (TDDSG), 1997).
As for voluntary self restrictions on the German market research side to my knowledge the 1995 ADM codex for interviews and group discussions (Arbeitskreis Deutscher Markt- und Sozialforschungsinstitute: "Richtlinie für die Aufzeichnung und Beobachtung von Gruppendiskussionen und qualitativen Einzelinterviews" has not been updated yet with respect to online requirements.
As an academic example I would like to mention ProjectH's comparatively comprehensive ethics-statements: The authors state that the general academic guidelines of the respective institutions apply, they restrict themselves to the use of public lists, keep to general anonymisation, identify writers only with their consent, and do not quote more than 1 or 2 sentences without approval (Rafaeli et al. 1998:267f) rsp. Cf. also (King 1996).

1.6 Interim summary

In the paragraphes above I mentioned several chances for new kinds of research, but also systematic limitations, as well as the need for self-imposed restrictions for the sake of privacy protection.
Dependent on one's main research interest these limitations may be serious ones, but on the other hand also in visible communication situations there is always hidden but important information involved. Advantages and disadvantages (cf. Döring 1999:206-208) have to be weighted according to the particular research design.
Even if we can only see parts of a picture and concentrate on the part of communication that constitutes the shared goods of a list (be it more the information pool or a social value produced), the enormous variety of mailinglist communication still makes it worth while to take a closer look, e.g. at the different types of use and motivation for participation.

2. Examples for Mailinglist (and Newsgroup) Studies

In the following I would like to briefly introduce some recent characteristic examples of mailinglist studies, including newsgroups studies, because often a similar methodology can be applied. In particular I focus on features of the list material studied, discipline context of approach, methods used and selected findings. Although a distinction between quantitative and qualitative approaches is used in order to characterize the main focus, I (with most of the authors mentioned) do not regard these approaches as mutually exclusive.

2.1 Mostly quantitative Studies with few reference to content

2.1.1 M.A. Smith, UCLA Center for the Study of Online Community (now Microsoft): "Netscan"

Over a period of 10 weeks in 1996-97 Marc Smith conducted an automated quantitative analysis of Usenet group activities (postings/time, postings/group, postings/poster, thread length, inter-group crossposting etc.) with the aim of generating some "base-line measures of online activity" that could eventually help to generate a "typology of cyberspaces" and to gain a basic understanding of "the social structure of cyberspace". His sample, the UCLA newsfeed, contained about 14,000 of an estimated total of 79,000 existing newsgroups, a fifth of them showing no activity at all (at his site).
Smith summarizes some of the results as follows: "Within the sample I examine, the average newsgroup has about 100 messages a week, contributed by fewer than 50 different people. Nearly all of the messages in a majority of newsgroups are crossposted to other newsgroups, and each group is connected to 50 other groups on average. On an average day, 18,000 people contribute 67,000 messages to the Usent. In an average hour, 3,500 messages are written by 1,200 people." (Smith 1999:209). Only about 20% of all messages receive a reply (Smith 1999:210), and "only 60 percent of all newsgroups have poster-to-post ratios that indicate conditions that allow for interaction" (Smith 1999:205f). Netscan can be found at

2.1.2 Ch. Stegbauer (sociology), A. Rausch (mathematics), University of Frankfurt, 1999: studies on inequality in virtual communities

In a series of studies in the context of social network analysis the authors conducted automated quantitative analyses of participation in mailinglists, looking at the frequency of postings, social networks on lists as seen through common threads, lurking behaviour, and inter-list connections.
Results include: Through a formal block analysis of postings on one list over 14 months they found an unequal participation in mailinglist discourses, and not the often claimed egality of cyberspace communications. Positions and roles (measured in terms of frequency of posting as well as communication relations) emerged on the list as in real life. (Stegbauer/Rausch 1999a).
Using seven list archives of two years (1996-98) the authors also studied the role of lurkers and found that only 30% of all new subscribers got active ("delurked") within one year. If people delurked, then relatively soon after subscription. There were fewer lurkers in high volume lists, which may be an indication that the primary motivation for lurking is not to "free-ride", i.e. to get a maximum of information for free.
People who lurked in one list sometimes were active in others, so maybe lurkers can have an important function for connecting discussion spaces. On a more principle level - given the large numbers of subscribers on many lists - it can be said that the existence of lurkers is one condition for the possibility of list communication, because if everybody "talked" at once, message overload would lead to the destruction of communication (Stegbauer 1999).
Another study examined the hypothesis that mailing lists lead to more interdisciplinary contacts. Comparing the membership lists of 1300 academic list of the UK Mailbase system, the authors found less participation across disciplines than expected (Stegbauer/Rausch 1999b).

2.2 Quantitative studies with focus on discussion content and social relations

2.2.1 Sh. Rafaeli, F. Sudweeks et al.: "ProjectH" and followups

"ProjectH" ( is the name of an international collaborative project that did a quantitative content analysis of of over 3000 e-mails from 30 online discussion groups (news, lists and CompuServe Special Interest Groups) between 1992 and 1994. Over 100 researchers used a common codebook of 46 variables, but also followed a variety of different research questions and methods operating on the units message, thread and list. Other projects emerged from this venture, some of them are documented in (Sudweeks, McLaughlin, Rafaeli 1998).
As for selected results, the authors found lots of humour (> 20% of messages), self-disclosure, and preference for agreement (Rafaeli/Sudweeks 1998:185f).
Interactive messages contained more statements of opinion, especially more statements of agreement than one-way or reactive messages (Rafaeli/Sudweeks 1998:186).
Messages that were likely to be referenced (i.e. support longer lasting threads) had characteristics like medium length, an appropriate subject line, a statement of fact, no abrupt change of topic, or referencing another message itself (Berthold et al. 1998:211).
An additional questionnaire survey showed that the perceived risk of potentially embarrassing CMC in certain newsgroups was considerably low (Witmer 1998:140f).
Women used more graphic accents then men (Witmer, Katzman 1998:9).

2.2.2 U. Matzat (sociology), University of Groningen: the role of Internet Discussion Groups (IDGs) for academic communication

Uwe Matzat ( studies academic communication using offline questionnaires as well as electronic ones posted to a selection of mailing lists. Research questions include the role of IDGs for researchers' social networks and effects of off-line networks on online communication.
He tests existing hypotheses relating to potential contact and information benefits of IDGs with data of a random sample of English and Dutch university researchers in the humanities, the social and the natural sciences.
Preliminary results show "a few information effects and, more often, contact benefits of IDGs: Researchers build up weak contacts that make their research more visible and that make them more aware of other researchers' work (useful for the reception of new research papers)." Nevertheless he finds no evidence for egalisation: "IDGs do not reduce inequalities in the distribution of access opportunities to informal communication channels." (personal communication 1999.11.18)

2.3 One list case studies

2.3.1 H. Buck (linguistics), University of Saarbrücken: linguistic characteristics of e-mail contributions to an academic list

Harald Buck in a quantitative study that also contains a detailed description of the studied list, examples and interpretations, tested several existing hypotheses about characteristics of e-mail against a selection of postings from a German language research oriented mailinglist. Namely: - E-mail is a new text category of its own., - E-mails contain comparatively many violations of norms for written text, - E-mail lies in between written and oral communication, - E-mail authors make use of discourse supporting means. Out of the 735 mails from a 10 month discussion period Buck selected a representative sample of 231 mails for his analysis.
In contrast to common judgements about electronic communication, the mistake rates remained within common ranges and were rather dependent on author and situation. Only some nearness to oral communication could be found, whereas several features of traditional letters (e.g. a three part structure with greeting, main text and another greeting) were found to be preserved. Language proved to be slightly informal, but polite. There was a high degree of dialogue supporting functions (quoting in over 57% of postings); emoticons were used as compensation for channel reduction. In summary the author suggests to refrain from rash generalisations about e-mail and electronic communication.

2.3.2 J. Hofmann (anthropology), WZ Berlin, Projektgruppe Kulturraum Internet: "`Let A Thousand Proposals Bloom' - Mailinglisten als Forschungsquelle", 1998

Jeanette Hofmann observed 6 months of list discussion on a technical (IETF) mailinglist and contrasted form and content of the results of her "lurking" observation with those of an interview carried out with one of the main list debaters in a later stage.
Her own record of an important longer debate on the list is constructed as a play in seven acts, where she identifies the main actors, actor types, topics and open questions.
Insights gained from this observation include a sense of how mailinglists reflect Internet technology development. The author notes an extremely open and cooperative culture of discourse on the list, as well as collective striving for solutions and common interpretations. She attributes this cooperative behaviour to the characteristic selection of participants on the list: Most of them are pioneers and experts working at technical frontiers and are interested in cultivating this new land.
As for the two different "windows" through which the ethnographer looked at the events, she found the main differences not in the faithfulness of the resulting picture, but in the selection and order of events as well as the presentation style: Whereas on the list the "techies" discussed without any recognisable care for possible observers, and many voices and interpretations could be heard in parallel, the interview proved to remain restricted to selected "important" topics, in hindsight events were synthesized, interpreted and explained, reasons analysed and connections drawn. The list discussions focused more on the "how", the interview on the "why" aspects. So in summary both sources complemented each other.

3. Some methodical proposals

3.1 Desirable future research

Looking at current Internet group communication studies, my impression is that there is a gap between two clusters of common research designs: on one side many small case studies with in-depth analyses (often of experimental communication settings like in classrooms), and on the other side some big formal (structural) computer-powered studies of lists and groups with little reference to the content discussed. So what I find is missing, are middle to large scale thorough content-analyses combined with computer-supported cross sections and investigations into quantifyable list features.
As pointing in this direction I would regard Project H's larger scale content analysis, which in this case could be achieved through hand coding by a lot of cooperating researchers. Helpful for lower manpower projects are approaches like those of (Fujitani/Akahori 1997, 1999), who use computers for keyword extraction and summaries.

In order to cope with larger amounts of data within a single person project and without access to expensive dedicated text mining machines, my suggestion would be to put some more consideration into tools and methods for text extraction and analysis.
Unfortunately current software for text analysis is still lacking standards and interoperability (Alexa/Züll 1999:134), so it can be hard to find the right combination of tools for a specialized project. Also multilingual support cannot be taken for granted. Unicode still needs some time to find its way into common applications, so e.g. in the case of dealing with 2-byte code character sets in East Asian languages sometimes again different tools are needed, and those available are often not easy to use for the non-professional computer user.

As one conclusion from this situation I see the need for more interdisciplinary cooperation between social scientists interested in a certain content and tool specialists e.g. from computer science, who would help to operationalise the research questions. Such a cooperation would not only contain the production of new tools for special questions, but social scientists could also learn from general paradigms and methodology in mathematics or information science.

3.2 A computer supported qualitative content analysis of fairly large list archives

My own project is a computer supported qualitative content anaylsis of two German and two Japanese mailinglists. The material consists of five years of list archives (1994-1998), there are more than 5000 E-mails, or about 30 MB of data. The list participants are mainly school teachers who discuss the merits and problems of Internet use at school.
My questions with respect to content mainly come from the field of educational technology: What are the teachers' views on Internet literacy, new roles in school education, and the challenges and chances that learning in a globally networked context brings about? What obstacles for Internet use at school do they observe?
Concerning formal aspects of communicaiton I look at topic careers, communication patterns and cultural differences.
On the methodological side my aim is to find ways for efficient extraction, coding and analysis of relevant passages from an amount of data that is too big for getting through everything by hand (for the first steps of exploring the field I use metaphors from archeology or cartography). Because in the near future a lot more research material will be available in electronic form and information overload is a serious problem, I hope these methods will be useful not only for extracting relevant information from mailing lists. One hypothesis is that it should bring a substantial improvement for social scientists to use comparatively simple and flexible software tools that meet actual needs (my basis here is Linux, Emacs, Perl, Tk etc.).
With respect to research design I as the "domain expert" cooperate with a "tool expert" in order to experiment with different approaches to explore my material. Some quantitative cross-sections shall help to find relevant passages that are then being hand-coded. The terminology that emerges again is being prepared for further processing. Alltogether a grounded theory like approach is used for the generation of hypotheses and possibly theory elements.

3.3 Resources for Japan related mailinglist research

Finally, without going into detail, I would like to introduce some sources and tools that could be useful for the pursuit of Japan related mailinglist studies.
As for research material there are huge lists of Internet mailinglists available, e.g. at Usenet newsgroups can be found under the fj.* hierarchy. There is also a variety of discussion forums in online services like Niftyserve.
On the tool side electronic dictionnaries, word seperating and stemming as well as indexing software can be useful for certain forms of searches. A comprehensive list has been compiled by Baba Hajime at There is also the possibility to write one's one scripts, e.g. in Perl, using electronic dictionnaries or word lists.
In the area of educational technology in Japan there exist a number of efforts for content extraction from mailing lists and other types of electronic communication. E.g. at the Akahori lab at Tokyo Institute of Technology ( S. Fujitani, M. Ishihara and K. Akahori have developed sytems for the extraction of topics (via keywords and key sentences) from educational mailinglists as a service to newcomers.
At Yano lab of Tokushima University ( Y. Yano, H. Ogata, T. Fukui, N. Furugori et al. also deal with electronic group communications.
For a general collection of Japanese capable software cf. the Monash "Nihongo" archive which has a mirror in Duisburg:


