This paper is intended as a practice oriented introduction to
Internet mailinglists as research material for German researchers in
Japanese Studies who have little previous experience with online
After some general remarks about mailinglists I mention new opportunities as well as limitations when dealing with mailinglists as research sources and present some current approaches to mailinglist research, with a focus on examples from social and culture studies. Following is a short analysis of what I think should be pursued further and a brief introduction of my own project that tries to fill a gap I perceive between two kinds of studies so far. I end with some recommendations of resources that can be useful for a Japan related mailinglist project.
Currently the online counterpart of this paper is available from http://www.gmd.de/People/Irene.Langner/docs/publ.html. If the page moves, it should be retrievable with the meta-keys "mailinglist research" and "Japanologentag 1999".
Mailinglists as regarded in this paper are Internet based electronic
discussion groups (as opposed to one-directional distribution lists).
On the server side they are administered by a list program (e.g.
listserv, listproc, majordomo, mailbase, lyris, mailman), for the
participants they are accessible via simple electronic mail.
In many respects mailinglists resemble (Usenet) newsgroups, bulletin board systems (BBSs), forums in online services or mailbox-nets, but in practice the different systems are often used by different people and for different purposes (cf. Döring 1999:35f).
With list archives made accessible via the WWW and webboards having similar functions, in some areas recently a merging of list and web use can be observed.
Although the principle of electronic conferencing dates back more than 20 years to the early days of computer mediated communication (CMC), with the world wide spread of the Internet nowadays a more diverse use and international participation leads to new potentials for information and communication as well as for research.
Because of the low bandwidth needed, mailinglists are especially useful for people in less connected areas. They are an important means of communication for distributed interest groups, for pioneers or people with special needs or interests who do not find like-minded people or support locally. They can be particularely useful for academic and learning activities.
Mailinglist vary greatly in focus and style. Lists can resemble notice boards, news tickers, talkshows or self-help groups. Features like the quality of information contributed, the style of interaction, discussion or moderation differ considerably from list to list. Accordingly lists can be examined in a variety of ways. Here I would like to distinguish roughly between three types of motivations for the study of mailinglists:
When dealing with mailinglists it is also important to recognize what
we do not know about the observed people and their communication.
Researchers who are interested in representative studies face the problem that the Internet user population is still not representative for the population at large, so generalisability can only be reached with respect to certain user groups (or an additional "offline" effort has to be made).
What we se on a list, may not be the whole picture necessary for a sufficient understanding of what is going on. E.g. there may well be additional private communication in the background of a list that also influences list discussions, but can not be observed by a list member. This is often the case with answers to questions sent via personal mail, so we cannot judge whether an information need has been satisfied through list members, unless there appears a clarifying statement on the list.
In most cases the majority of subscribed members remains passiv (engl: "lurker", jap: "ROM" ("read only member")), so except for mail addresses we often do not know anything about the biggest part of such a "group". Do they read messages at all, do they just log or archive, is the mail account abandoned?
For the interpretation of certain behaviour "in real life" we are used to take into account additional visible information, like non-verbal indications for moods or intentions, gender, age, physical condition or status of the people discussing. Nevertheless in a text based online context such social cues about the participants are either absent or at least we cannot be sure. Arbitrary construction of virtual persons is easily possible.
We cannot even assume that one mail address means one person. Behind one e-mail address there may well be several people, other lists or software agents. E.g. "Tanaka Tomoyuki" is a wellknown figure in newsgroups like soc.culture.japan, but there have been a lot of speculations about his actual identity.
If a poster really intends to hide his identity, clues about the origin of an e-mail can almost completely be removed by using an anonymous remailer.
In the case of a research project that includes automatic counting of postings, threads, or communication relations, spam (i.e. unsolicited commercial e-mail), off-topic postings and other "non-contributions" may distort the picture.
On many mailinglists subscription and logging of all communication
is possible for anyone, but participants may not be aware of this fact.
E.g. a study about potentially embarrassing communication in
newsgroups showed a surprisingly low level of risk perception (Witmer
In addition, personal information gained through mailinglist observation can be combined with other observations of online behaviour, because Internet users leave traces and personal information in all sorts of places: through e-mail, on news, lists, websites etc. In summary, putting together several information sources, user profiles of great detail can be generated and the possibilities for misuse have increased enormously compared to "offline" research, so the legitimacy of online research activities has to be questionned in every case. Like with other types of observation there is also the danger of destroying one's subject by "tearing it into the light" through research (Smith 1999:211f).
In the paragraphes above I mentioned several chances for new kinds
of research, but also systematic limitations, as well as the need for
self-imposed restrictions for the sake of privacy protection.
Dependent on one's main research interest these limitations may be serious ones, but on the other hand also in visible communication situations there is always hidden but important information involved. Advantages and disadvantages (cf. Döring 1999:206-208) have to be weighted according to the particular research design.
Even if we can only see parts of a picture and concentrate on the part of communication that constitutes the shared goods of a list (be it more the information pool or a social value produced), the enormous variety of mailinglist communication still makes it worth while to take a closer look, e.g. at the different types of use and motivation for participation.
In the following I would like to briefly introduce some recent characteristic examples of mailinglist studies, including newsgroups studies, because often a similar methodology can be applied. In particular I focus on features of the list material studied, discipline context of approach, methods used and selected findings. Although a distinction between quantitative and qualitative approaches is used in order to characterize the main focus, I (with most of the authors mentioned) do not regard these approaches as mutually exclusive.
In a series of studies in the context of social network analysis the
authors conducted automated quantitative analyses of participation in
mailinglists, looking at the frequency of postings, social networks on
lists as seen through common threads, lurking behaviour, and
Results include: Through a formal block analysis of postings on one list over 14 months they found an unequal participation in mailinglist discourses, and not the often claimed egality of cyberspace communications. Positions and roles (measured in terms of frequency of posting as well as communication relations) emerged on the list as in real life. (Stegbauer/Rausch 1999a).
Using seven list archives of two years (1996-98) the authors also studied the role of lurkers and found that only 30% of all new subscribers got active ("delurked") within one year. If people delurked, then relatively soon after subscription. There were fewer lurkers in high volume lists, which may be an indication that the primary motivation for lurking is not to "free-ride", i.e. to get a maximum of information for free.
People who lurked in one list sometimes were active in others, so maybe lurkers can have an important function for connecting discussion spaces. On a more principle level - given the large numbers of subscribers on many lists - it can be said that the existence of lurkers is one condition for the possibility of list communication, because if everybody "talked" at once, message overload would lead to the destruction of communication (Stegbauer 1999).
Another study examined the hypothesis that mailing lists lead to more interdisciplinary contacts. Comparing the membership lists of 1300 academic list of the UK Mailbase system, the authors found less participation across disciplines than expected (Stegbauer/Rausch 1999b).
Harald Buck in a quantitative study that also contains a detailed
description of the studied list, examples and interpretations, tested
several existing hypotheses about characteristics of e-mail against a
selection of postings from a German language research oriented
mailinglist. Namely: - E-mail is a new text category of its own.,
- E-mails contain comparatively many violations of norms for written
text, - E-mail lies in between written and oral communication, -
E-mail authors make use of discourse supporting means.
Out of the 735 mails from a 10 month discussion period Buck selected a
representative sample of 231 mails for his analysis.
In contrast to common judgements about electronic communication, the mistake rates remained within common ranges and were rather dependent on author and situation. Only some nearness to oral communication could be found, whereas several features of traditional letters (e.g. a three part structure with greeting, main text and another greeting) were found to be preserved. Language proved to be slightly informal, but polite. There was a high degree of dialogue supporting functions (quoting in over 57% of postings); emoticons were used as compensation for channel reduction. In summary the author suggests to refrain from rash generalisations about e-mail and electronic communication.
Jeanette Hofmann observed 6 months of list discussion on a technical
(IETF) mailinglist and contrasted form and content of the results of
her "lurking" observation with those of an interview carried out
with one of the main list debaters in a later stage.
Her own record of an important longer debate on the list is constructed as a play in seven acts, where she identifies the main actors, actor types, topics and open questions.
Insights gained from this observation include a sense of how mailinglists reflect Internet technology development. The author notes an extremely open and cooperative culture of discourse on the list, as well as collective striving for solutions and common interpretations. She attributes this cooperative behaviour to the characteristic selection of participants on the list: Most of them are pioneers and experts working at technical frontiers and are interested in cultivating this new land.
As for the two different "windows" through which the ethnographer looked at the events, she found the main differences not in the faithfulness of the resulting picture, but in the selection and order of events as well as the presentation style: Whereas on the list the "techies" discussed without any recognisable care for possible observers, and many voices and interpretations could be heard in parallel, the interview proved to remain restricted to selected "important" topics, in hindsight events were synthesized, interpreted and explained, reasons analysed and connections drawn. The list discussions focused more on the "how", the interview on the "why" aspects. So in summary both sources complemented each other.
Looking at current Internet group communication studies, my impression
is that there is a gap between two clusters of common research
designs: on one side many small case studies with in-depth analyses
(often of experimental communication settings like in classrooms), and
on the other side some big formal (structural) computer-powered
studies of lists and groups with little reference to the
content discussed. So what I find is missing, are middle to large
scale thorough content-analyses combined with computer-supported
cross sections and investigations into quantifyable list features.
As pointing in this direction I would regard Project H's larger scale content analysis, which in this case could be achieved through hand coding by a lot of cooperating researchers. Helpful for lower manpower projects are approaches like those of (Fujitani/Akahori 1997, 1999), who use computers for keyword extraction and summaries.
In order to cope with larger amounts of data within a single person
project and without access to expensive dedicated text mining
machines, my suggestion would be to put some more consideration into
tools and methods for text extraction and analysis.
Unfortunately current software for text analysis is still lacking standards and interoperability (Alexa/Züll 1999:134), so it can be hard to find the right combination of tools for a specialized project. Also multilingual support cannot be taken for granted. Unicode still needs some time to find its way into common applications, so e.g. in the case of dealing with 2-byte code character sets in East Asian languages sometimes again different tools are needed, and those available are often not easy to use for the non-professional computer user.
As one conclusion from this situation I see the need for more interdisciplinary cooperation between social scientists interested in a certain content and tool specialists e.g. from computer science, who would help to operationalise the research questions. Such a cooperation would not only contain the production of new tools for special questions, but social scientists could also learn from general paradigms and methodology in mathematics or information science.
My own project is a computer supported qualitative content anaylsis of
two German and two Japanese mailinglists. The material consists of
five years of list archives (1994-1998), there are more than 5000
E-mails, or about 30 MB of data. The list participants are mainly
school teachers who discuss the merits and problems of Internet use at
My questions with respect to content mainly come from the field of educational technology: What are the teachers' views on Internet literacy, new roles in school education, and the challenges and chances that learning in a globally networked context brings about? What obstacles for Internet use at school do they observe?
Concerning formal aspects of communicaiton I look at topic careers, communication patterns and cultural differences.
On the methodological side my aim is to find ways for efficient extraction, coding and analysis of relevant passages from an amount of data that is too big for getting through everything by hand (for the first steps of exploring the field I use metaphors from archeology or cartography). Because in the near future a lot more research material will be available in electronic form and information overload is a serious problem, I hope these methods will be useful not only for extracting relevant information from mailing lists. One hypothesis is that it should bring a substantial improvement for social scientists to use comparatively simple and flexible software tools that meet actual needs (my basis here is Linux, Emacs, Perl, Tk etc.).
With respect to research design I as the "domain expert" cooperate with a "tool expert" in order to experiment with different approaches to explore my material. Some quantitative cross-sections shall help to find relevant passages that are then being hand-coded. The terminology that emerges again is being prepared for further processing. Alltogether a grounded theory like approach is used for the generation of hypotheses and possibly theory elements.
Finally, without going into detail, I would like to introduce some
sources and tools that could be useful for the pursuit of Japan
related mailinglist studies.
As for research material there are huge lists of Internet mailinglists available, e.g. at http://mlnews.com/jp. Usenet newsgroups can be found under the fj.* hierarchy. There is also a variety of discussion forums in online services like Niftyserve.
On the tool side electronic dictionnaries, word seperating and stemming as well as indexing software can be useful for certain forms of searches. A comprehensive list has been compiled by Baba Hajime at http://www.kusastro.kyoto-u.ac.jp/%7Ebaba/wais/other-system.html. There is also the possibility to write one's one scripts, e.g. in Perl, using electronic dictionnaries or word lists.
In the area of educational technology in Japan there exist a number of efforts for content extraction from mailing lists and other types of electronic communication. E.g. at the Akahori lab at Tokyo Institute of Technology (http://www2.ak.cradle.titech.ac.jp/) S. Fujitani, M. Ishihara and K. Akahori have developed sytems for the extraction of topics (via keywords and key sentences) from educational mailinglists as a service to newcomers.
At Yano lab of Tokushima University (http://www-yano.is.tokushima-u.ac.jp) Y. Yano, H. Ogata, T. Fukui, N. Furugori et al. also deal with electronic group communications.
For a general collection of Japanese capable software cf. the Monash "Nihongo" archive which has a mirror in Duisburg: ftp://ftp.uni-duisburg.de/pub/mirror/nihongo/monash/.
Alexa, Melina, Züll, Cornelia (1999): A Review of Software for Text Analysis. Mannheim: ZUMA. ftp://ftp.zuma-mannheim.de/pub/zuma/zuma-nachrichten_spezial/znspezial5.pdf
Berthold, Michael et al. (1998): It Makes Sense: Using an Autoassociative Neural Network to Explore Typicality in Computer Mediated Discussions. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 191-219.
Buck, Harald (1999): Kommunikation in elektronischen Diskussionsgruppen. NETWORX Arbeiten im Netz zum Thema Sprache und Internet Nr. 11. http://www.websprache.uni-hannover.de/networx/docs/networx-11.htm
Döring, Nicola (1999): Sozialpsychologie des Internet. Die Bedeutung des Internet für Kommunikationsprozesse, Identitäten, soziale Beziehungen und Gruppen. Göttingen: Hogrefe.
Fujitani, Satoru, Akahori, Kanji (1997): Summarized Keyword Sampling System for Mailing-list Review [in Japanese]. In: JCET 5, Sep. 11-13, 1997, Tokyo: pp 629-630.
Fujitani, Satoru, Akahori, Kanji (1999): A Summary Sentence Extraction Method for Web-based Mailing List Review Application and Its Effectiveness Study. In: Geoff Cumming, Toshio Okamoto, Louis Gomez (Eds): Advanced Research in Computers and Communications in Education, Vol. 1. Amsterdam etc: IOS Press, pp 327-334.
Hofmann, Jeanette (1998): "Let A Thousand Proposals Bloom" -
Mailinglisten als Forschungsquelle". In: Bernad Batinic et al. (Eds):
Online Research. Goettingen: Hogrefe
Ishihara, Masayoshi, Akahori, Kanji (1998): Development of a System to Generate Digests of Internet Articles for Supporting Discussions [in Japanese]. In: Nihon Kyouiku Kougaku Zasshi Vol. 22 No. 1, 1998, pp 1-12.
King, Storm (1996): Researching Internet Communities: Proposed Ethical Guidelines for the Reporting of Results. In: The Information Society 12(2), pp 119-28.
Ogata, H., Yano, Y. (1999): Combining Social Networks and Collaborative Learning in Distributed Organisations. In: Betty Collis, Ron Oliver (Eds): Proceedings of Ed-Media 1999,Vol. 1. Charlottesville: AACE, pp 119-125.
Rafaeli, Sheizaf et al. (1998): ProjectH: A Collaborative Quantitative
Study Of Computer-Mediated Communication. In: Fay Sudweeks, Margaret
McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups
on the Internet. Menlo Park: AAAI Press/MIT Press, pp 265-81
Rafaeli, Sheizaf, Sudweeks, Fay (1998): Interactivity on the Nets. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 173-189.
Smith, Marc (1999): Invisible Crowds in Cyberspace: Measuring and Mapping the Social Structure of USENET. In: Marc Smith, Peter Kollock (Eds): Communities in Cyberspace. London: Routledge Press, pp 195-219.
Stegbauer, Christian, Rausch, Alexander (1999a): Ungleichheit in virtuellen Gemeinschaften. In: Soziale Welt '99. Zeitschrift für sozialwissenschaftliche Forschung und Praxis, Heft 1, pp 93-110.
Stegbauer, Christian (1999): Die Rolle der Lurker in Mailinglisten.
Vortrag auf ISKO'99 Hamburg, 23.-25.09.1999
Stegbauer, Christian, Rausch, Alexander (1999b): Fragmentierung oder
Integration - Untersuchung zur thematischen Überschneidung von Mailinglists.
Vortrag auf German Online Research (GOR '99) Nürnberg, 28.-29.10.1999
Sudweeks, F., McLaughlin, M., Rafaeli, S. (Eds) (1998): Network and Netplay. Virtual Groups on the Internet. Menlo Park, Cambridge, London: AAAI Press. cf. http://www.it.murdoch.edu.au/~sudweeks/projecth/netplay.html.
Witmer, Diane, Katzman, Sandra (1998): Smile When You Say That: Graphic Accents as Gender Markers in Computer-Mediated Communication. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 3-11.
Witmer, Diane (1998): Practicing Safe Computing: Why People Engage in Risky Computer-Mediated Communication. In: Fay Sudweeks, Margaret McLaughlin, Sheizaf Rafaeli (Eds): Network & Netplay. Virtual Groups on the Internet. Menlo Park: AAAI Press/MIT Press, pp 127-146.