LIBRES: Library and Information Science Research
Electronic Journal ISSN 10586768
1999 Volume 9 Issue 2; September 30.
Biannual LIBRE9N2
Further investigations into the firstcitation process:
the case of population genetics

National Institute of Science,
Technology and Development Studies
Dr. K.S. Krishnan Marg, New Delhi
110012, India
Email: bmg@csnistad.ren.nic.in
and
KHBO, Zeedijk 101, 8400 Oostende, Belgium, and
UIA, IBW, Universiteitsplein 1, 2610 Wilrijk, Belgium
Email: ronald.rousseau@kh.khbo.be
In this article the firstcitation
process is investigated. Former studies led to two double exponential models
for this process. The first model resulted in a concave function, the other one
in a function with an inflection point (an Sshaped function). Real data using
a year as time unit could best be described by the first model, data using two
weeks as a unit could best be described by the second one. In this note we show
that for a group of nine related data sets in the field of population genetics,
using one year as a unit, the first observation is confirmed: the concave model
can adequately describe such data.
Introduction
Science is cumulative in nature:
each new research article is built on the foundation of previous articles
(and/or books). An author acknowledges this by referring to these articles and
books in a reference list. The study of this 'scholarly bricklaying', as Price
(1963) calls it, is known as citation analysis. Citation analysis studies
different aspects of the 'Citation Culture' (Wouters, 1999): motivations for
citing, the citation network as a mathematical graph, statistical aspects of
citations and references, mappings of the citation network, etc… For reviews on
citation analysis and a theory of citation we refer the reader to (Egghe and
Rousseau, 1990; Liu, 1993; Leydesdorff, 1998; Wouters, 1999).
In previous studies (Rousseau
1993,1994) we investigated the firstcitation process. Under the term
'firstcitation process' we mean the abstract process that shifts a published
article from the 'uncited' to the 'cited' group. As described in (Rousseau,
1994), the publication of an article (or group) of articles can be considered
as the introduction of a stimulus in an abstract 'information space' (perhaps
Popper's World III). Then citations, as symbols for 'use' can be interpreted as
responses to this stimulus. In particular, the first citation is the first sign
of response. It is a token that the article is not left unnoticed. One can also
say that articles that have been cited (at least once) have past an initial
filter. Perhaps this filter separates the totally unused (unnoticed?) articles
from the other ones, having presumably more (scientific) potential.
A statistical study of the
cumulative firstcitation distribution of a group of articles led to two double
exponential models: one resulting in a concave function, the other one
resulting in a function with an inflection point (an Sshaped function). Real
data using a year as time unit could best be described by the first model, data
using two weeks as a unit could best be described by the second one. These
observations, though, were only based on two data sets. In this note we will
show that for a group of nine related data sets, using one year as a unit, the
first observation is confirmed. The first model can adequately describe such
data. This model implies that the rate of conversion from the set of articles
that have been cited to those that have not, is a decreasing function of time.
The firstcitation process as
described here has not received much attention in the scientific literature.
Yet, Moed and Van Raan (1986) and Schubert and Glänzel (1986) did consider the
time between publication and first citation as a journal indicator of
immediacy. Moreover, Glänzel (1992) found that the mean of the first response
determined to a large extent the complete citation distribution. In his study
Glänzel used stopping times (a special kind of random variables (Egghe, 1984)),
which lead to a considerably more sophisticated approach than the simple
statistical procedure used here and in our earlier study (Rousseau, 1994).
Source
articles were taken from the "Bibliography of Theoretical Population
Genetics" (Felsenstein, 1981.) This bibliography was selected as the
source of our investigation because it comprehensively covers the publications
in the field of theoretical population genetics from 1870 to 1980. However, the
only source for collecting citation data is ISI's Science Citation Index, which
is only available from 1955 (leaving the retrospective SCI covering 19451954
aside.) As a result the period of study is restricted to 19551980. Moreover,
as citation data were collected manually only a selected set of data was
studied. Concretely: we only investigated the articles (from the bibliography)
published in 1955, 1958, 1961, …1979, with a threeyear interval. The resulting
nine related databases suffice largely for the purpose of this study. We refer
the reader to e.g. (Gupta, 1997; Gupta et al., 1998; Kretschmer & Gupta,
1998), where this bibliography has been studied from other points of view.
Complete citation data on which our investigation is based are presented in the
appendix.
Before
explaining the model we would like to point out another aspect of the data. A
certain percentage of articles will always remain uncited. Yet, it seems that
for earlier articles a smaller percentage remains uncited. This is clearly
shown in Table 1. In the years 1955,1958 and 1961 15 to 23% of the articles
remains uncited over a period of approximately 18 years, while for later years
this percentage is considerably higher. We are unable to suggest an explanation
for this. Note also that the term 'uncited' really means 'uncited by journals
covered by the ISI database'.
Table 1
Percentages of uncited articles (each considered over a period of approximately
18 years)
year 
number of articles 
number of cited articles 
number of uncited articles 
percentage of uncited articles 
1955 
75 
64 
11 
14.7 
1958 
77 
64 
13 
16.9 
1961 
122 
94 
28 
23.0 
1964 
167 
115 
52 
31.1 
1967 
301 
202 
99 
32.9 
1970 
324 
203 
121 
37.3 
1973 
418 
282 
136 
32.5 
1976 
465 
307 
158 
34.0 
1979 
483 
338 
145 
30.0 
Consider a fixed group of N articles.
Let C(t) be the cumulative number of articles cited at least once (in journals
covered by ISI) over a period of length t years. We assume that the change in
C(t) is proportional to the number of uncited articles, with a timedependent proportionality factor q(t) = A e^{at},
a ³ 0, A > 0.
This factor can be interpreted as a conversion factor. This conversion factor
is assumed to be exponentially decreasing. It describes the rate at which
articles shift from the uncited group to the cited one. Putting R(t) = C(t)/N
leads to the following differential equation:
The solution of this differential equation is the function:
where k = 1 – R(0), and b = e ^{–A/a}. For the proof we refer to
(Rousseau, 1993,1994). Note further that
This
means that in this model not all articles need ever be cited, which is a
realistic assumption. Indeed, the model predicts that 100 kb % of all published
articles will remain uncited. In (Rousseau, 1993,1994) we have shown that
citations to Russian language library science periodicals, as published by
Motylev (1981), fitted this equation quite well.
The conversion factor
Before
we turn to the results of the fitting exercise we would like to have another
look at the data to see if they indeed suggest an exponentially decreasing
conversion factor from the group of cited articles to the group of uncited
ones. Table 2 gives the cumulative percentages of cited articles in the first
five years after publication. It is
clear that this percentage increases more and more slowly. After 5 years more
than 80% of all citations have occurred. Because of this characteristic of the
data we were able to fit the double exponential model.
Table 2.
Cumulative percentages of cited articles (CPCA) in the first five years after
publication
CPCA in 
1955 
1958 
1961 
1964 
1967 
1970 
1973 
1976 
1979 
first year 
16.0 
9.1 
6.6 
3.0 
12.0 
5.6 
9.1 
13.1 
6.0 
first 2 years 
38.7 
28.6 
32.8 
24.0 
33.6 
22.5 
26.3 
35.9 
45.8 
first 3 years 
54.7 
41.6 
47.5 
40.1 
44.9 
35.5 
42.8 
49.2 
56.9 
first 4 years 
60.0 
50.6 
58.2 
50.3 
50.5 
41.4 
48.8 
55.1 
60.9 
first 5 years 
65.3 
59.7 
61.5 
55.1 
53.8 
45.1 
53.8 
60.0 
64.0 
In order
to fit our double exponential model we have to find values for the parameters
k, b and a. Moreover A = a ln(b). Table 3 presents the results of a nonlinear
least squares fit based on Marquardt's algorithm (Marquardt, 1963). Fig. 1
illustrates the case of the 1973 data.
Table 3.
Parameter values of best fitting curves
data set 
k 
b 
a 
A 
R² 
1 – kb 
1955 
0.828 
0.200 
0.200 
0.32 
0.992 
0.83 
1958 
0.917 
0.147 
0.141 
0.27 
0.998 
0.87 
1961 
0.930 
0.241 
0.253 
0.36 
0.998 
0.78 
1964 
0.981 
0.321 
0.290 
0.33 
0.998 
0.69 
1967 
0.872 
0.389 
0.310 
0.29 
0.997 
0.66 
1970 
0.930 
0.401 
0.216 
0.20 
0.991 
0.63 
1973 
0.913 
0.352 
0.265 
0.28 
0.997 
0.68 
1976 
0.869 
0.393 
0.408 
0.38 
0.999 
0.66 
1979 
0.934 
0.332 
0.600 
0.66 
0.994 
0.69 
Judging by the high R²values we may
assume that equation (2) yields an adequate representation of the
firstcitation process. This at least if data are collected on a yearly
basis. This assumption is confirmed by
a graphical analysis of the best fitting curves to the data (cf. Fig.1).
These results predict that from 1964
on, about 30% of all articles presented in Felsenstein's bibliography will
never be cited in journals covered by ISI. Before that time this percentage is
lower (between 10 and 20%). This is in accordance with the data (as described
in a previous section).
Fig.1 Best fitting curve and asymptote indicating limiting value (0.68)
for 1973 firstcitation data
We have
shown that, for data in the field of population genetics, the first double
exponential model as studied in (Rousseau 1993,1994) adequately describes
firstcitation data, collected on a yearly basis.
Most
articles, except some publications in ejournals, suffer from a publication
delay between the acceptance (after peer review) of the manuscript and its
actual publication. This, clearly, has an influence on the time between
publication and first citation (it is the publication delay of the citing
article that is important here). Hence, it would be interesting to study this
influence, e.g. by comparing firstcitations for articles in ejournals and for
other articles. Note that the mechanism that is at work here (considered from a
modeltheoretic point of view) is the convolution of two distributions. This
mechanism has been explained in (Rousseau, 1998) and studied in a citation
context in (Egghe and Rousseau, 2000).
Acknowledgement
The
authors thank an anonymous referee for a number of pertinent observations
leading to a more readable article.
References
Egghe, L. (1984). Stopping time techniques for analysts and probabilists. Cambridge
(UK): Cambridge University Press.
Egghe,L. and Rousseau, R. (1990). Introduction to Informetrics. Quantitative
methods in library, documentation and information science. Amsterdam:
Elsevier.
Egghe,L. and Rousseau,R. (2000). The influence
of publication delays on the observed ageing distribution of scientific
literature. Journal of the American
Society of Information Science (to appear).
Felsenstein, J. (1981). Bibliography of Theoretical Population Genetics. Stroudsburg (PA):
Dowden, Hutchinson & Ross.
Glänzel, W. (1992). On some stopping
times of citation processes. From theory to indicators. Information Processing & Management, 28, 5360.
Gupta, B.M. (1997). Analysis of
distribution of the age of citations in theoretical population genetics. Scientometrics, 40(1), 139162.
Gupta, B.M., Kumar, S. &
Rousseau, R. (1998). Applicability of selected probability distributions to the
number of authors per article in theoretical population genetics. Scientometrics, 42(3), 325334.
Kretschmer, H. and Gupta, B.M.
(1998). Collaboration patterns in theoretical population genetics. Scientometrics, 43(3), 455462.
Leydesdorff, L. (1998) Theories of
citation? Scientometrics, 43, 525.
Liu, M. (1993). The complexities of
citation practice: a review of citation studies. Journal of Documentation, 49, 370408.
Marquardt, D.W. (1963). An algorithm
for least squares estimation of nonlinear parameters. Journal of the Society of Industrial and Applied Mathematics, 2,
431441.
Moed, H. and Van Raan, A. (1986).
Crossfield impact and impact delay of physics departments. Czechoslovak Journal of Physics B, 36,
97400.
Motylev, V.M. (1981). Study into the
stochastic process of change in the literature citation pattern and possible
approaches to literature obsolescence estimation. International Forum on Information and Documentation, 6, 312.
Price, D. J. de Solla (1963). Little Science, Big Science. New York:
Columbia University Press.
Rousseau, R. (1993). Double
exponential models for firstcitation processes. Report University of Antwerp
(UIA). { copies available from the author }
Rousseau, R. (1994). Double
exponential models for firstcitation processes. Scientometrics, 30(1), 213227.
Rousseau, R. (1998). Convolutions
and their applications in information science. The Canadian Journal of Information and Library Science/Revue
canadienne des sciences de l'information et de bibliothéconomie, 23(3),
2947.
Each table
presents, for a different publication year, the total number of source
journals, and, for all years, starting with the publication year, the number of
articles that have been cited for the first time during that year. The
difference between the number of source articles and the cumulative number of
articles that are cited at least once, is the number of uncited articles, as
presented in Table 1.
Table 4 1955: 75 source articles
year 
number of
articles cited for the first time 
year 
number of
articles cited for the first time 
1955 
12 
1964 
2 
1956 
17 
1965 
1 
1957 
12 
1966 
0 
1958 
4 
1967 
1 
1959 
4 
1968 
1 
1960 
2 
1969 
1 
1961 
4 
1970 
1 
1962 
1 
1971 
1 
1963 
0 


Table 5 1958: 77 source articles
year 
number of
articles cited for the first time 
year 
number of
articles cited for the first time 
1958 
7 
1967 
1 
1959 
15 
1968 
0 
1960 
10 
1969 
1 
1961 
7 
1970 
1 
1962 
7 
1971 
1 
1963 
5 
1972 
0 
1964 
3 
1973 
0 
1965 
3 
1974 
1 
1966 
2 


year 
number of
articles cited for the first time 
year 
number of
articles cited for the first time 
1961 
8 
1968 
1 
1962 
32 
1969 
3 
1963 
18 
1970 
2 
1964 
13 
1971 
2 
1965 
4 
1972 
1 
1966 
5 
1973 
0 
1967 
4 
1974 
1 
Table 7 1964: 167 source
articles
year 
number of
articles cited for the first time 
year 
number of
articles cited for the first time 
1964 
5 
1973 
1 
1965 
35 
1974 
1 
1966 
27 
1975 
1 
1967 
17 
1976 
0 
1968 
8 
1977 
0 
1969 
7 
1978 
0 
1970 
6 
1979 
1 
1971 
2 
1980 
1 
1972 
2 
1981 
1 
year 
number of
articles cited for the first time 
year 
number of
articles cited for the first time 
1967 
36 
1976 
2 
1968 
65 
1977 
2 
1969 
34 
1978 
2 
1970 
17 
1979 
1 
1971 
10 
1980 
2 
1972 
12 
1981 
1 
1973 
10 
1982 
1 
1974 
3 
1983 
1 
1975 
2 
1984 
1 
Table 9 1970: 324 source
articles
year 
number of
articles cited for the first time 
year 
number of articles
cited for the first time 
1970 
18 
1978 
4 
1971 
55 
1979 
6 
1972 
42 
1980 
6 
1973 
19 
1981 
5 
1974 
12 
1982 
5 
1975 
12 
1983 
4 
1976 
8 
1984 
3 
1977 
4 


Table 10 1973: 418 source articles
year 
number of
articles cited for the first time 
year 
number of
articles cited for the first time 
1973 
38 
1981 
4 
1974 
72 
1982 
6 
1975 
69 
1983 
4 
1976 
25 
1984 
5 
1977 
21 
1985 
1 
1978 
16 
1986 
2 
1979 
8 
1987 
2 
1980 
9 


year 
number of
articles cited for the first time 
year 
number of
articles cited for the first time 
1976 
61 
1983 
5 
1977 
106 
1984 
4 
1978 
62 
1985 
3 
1979 
27 
1986 
3 
1980 
23 
1987 
2 
1981 
4 
1988 
1 
1982 
6 


Table 12 1979: 483 source
articles
year 
number of
articles cited for the first time 
year 
number of articles
cited for the first time 
1979 
29 
1986 
4 
1980 
192 
1987 
6 
1981 
54 
1988 
1 
1982 
19 
1989 
1 
1983 
15 
1990 
2 
1984 
7 
1991 
1 
1985 
6 
1992 
1 
National Institute of Science, Technology and
Development Studies
Dr. K.S. Krishnan Marg, New Delhi 110012, India
Email:
bmg@csnistad.ren.nic.in
and
KHBO, Zeedijk 101, 8400 Oostende, Belgium, and
UIA,
IBW, Universiteitsplein 1, 2610 Wilrijk, Belgium
Email: ronald.rousseau@kh.khbo.be
To subscribe to LIBRES send email message to
listproc@info.curtin.edu.au
with the text:
subscribe libres [your first name] [your last name]
________________________________________
Return to Libre9n2 Contents
Return to Libres Home Page