Deep Citations : Current Issues




state of the art & issues



Professor Patrick Dunleavy from LSE wrote an incisive piece on the problems with citations on, which I quote for this section:


For books this includes:


  • author last name; plus at least one first name initial (or full first name);
  • the same details for all other authors
  • full title (including sub-titles, although they are often left off)
  • place of publication; the name of the publisher; and the year date of publication


For journal articles the standard core now includes:


  • author(s) last name, and initial(s) or first name, as for books.
  • full title (including sub-titles)
  • the journal name. This should be utterly straightforward. But instead journals often abbreviate their own names and the names of other journals in various completely unpredictable and unnecessary ways (e.g writing ‘Jnl’ instead of ‘journal’ or using acronyms). Different sets of academic editors, professional associations and commercial publishers seem to take a perverse pride in recording the source details of publications in different ways. These are ultra-legacy elements, decipherable only by the cognoscenti and stemming from the days of letter press printing half a century ago, when saving characters also saved money. they have no current rationale at all.
  • the year of publication, sometimes supplemented by a month date or a season name (e.g. Summer)
  • the volume of publication. Actually this should always be just an esoteric re-coding of the year, with the number depending on when volume 1 first happened. However, some journals also have more than volume per calendar year, to make things difficult. An extra spice of incomprehensibility is added by some journals who record their volume numbers in Roman numerals (like LXIII) — a master stroke of one-upmanship.
  • the page numbers range — i.e. the beginning page number to end page number.


the problem with current citations


I continue to quote Professor Patrick Dunleavy from LSE on why these core details don’t work any more, with digital documents, where he emphasises addressability:


  • Author names in many British and European sources (book publishers and journals alike) often still include just a single initial. Academics and professionals from these smaller nations have been remarkably slow to appreciate the globalization of knowledge, and hence the need for much more distinctive author names. They (and their journals) are still reluctant to go beyond a single initial (J.) to distinguish John Smith from Joan Smith. By contrast, American publishers and journals (more accustomed to a country with 300 million people in it) tend to give the first name in full, and sometimes a second initial as well. Clearly, in the era of global search engines the US practice needs to become universal, but there is still a long way to go.
  • Number of Articles. In all the STEM sciences the number of author names for journal articles has tended to increase sharply in the last ten years. In some disciplines the proliferation of author names is beginning to cause some reductions to be made in which author names are included in references, although the process is proceeding in an erratic and non-standardized way. Where once journals or book publishers might have tried to list all authors, increasingly some sources will only list the first ten or even just the first five authors in their reference lists. People who want a full author list hence need to actually go to the article itself, where the first page will still show everyone involved. With many physics papers listing 50+ authors, and some several hundred, this change has become inevitable.
  • Volume and issue numbers no longer make sense for journals that have moved to continuous publication. Even for journals that retain volume and issue numbers almost all articles are now being published online before (often months or year before) they get a print volume and issue number. The gap between online publication and print issue publication can be substantial. In the social sciences and humanities other authors may be very reluctant to cite such online pieces, because online papers are often harder to find on dated publisher websites, and they know that any interim reference they make will become obsolescent. Most libraries, electronic depositories and publishers insist on rewriting the citation of an early online article, so as to use instead the print on paper volume and issue version of the reference, even when this effectively falsifies the timing of the work. For instance, one of my recent papers spent nearly two years in this limbo of ‘early online’ status, and at the end of it had changed from an autumn 2011 piece to a summer 2013 one. In other words the current core convention deliberately introduces inaccuracy and falsified details into academic referencing, the opposite of their supposed purpose.
  • Page numbers are also irrelevant for many sources now. In ‘early online’ articles the page numbers all start at 1, until the article gets incorporated into a specific volume and issue number, where suddenly the page numbers are changed completely — thereby invalidating any previous page-specific citations. Similarly many ebooks now often do not have page numbers, since they re-size automatically to fit the screen size of the device that readers are using, and to adjust to readers’ preferences for font sizes. Hence pagination in the digital age makes no sense at all.
  • Place of publication for books is also a mostly pointless piece of information. Many big international publishers issue the same books in two or more places at the same time — for instance, in the USA and in the UK or Europe — yet these identical books will be referenced as if they were different. For many smaller publishers it is sometimes a bit of a job to find out where the place of publication actually is — even by searching their websites. This is often the last piece of information that I have to include in my reference lists, precisely because it actually matters very little in a digital era. Every publisher of any importance worldwide is now on the internet and the web and almost all will be accessed via Google Books or Amazon — home even to self-publishers nowadays.


Further points on the limitations of citations are made by Hadas Shema, information specialist at the Israeli Inter-University Center for E-Learning in a Blog post on Scientific American:


  • Secondary sources – Goodbye, citations. Once your article has been covered in a review or two, your findings will often be credited to the review article rather than your own. I’m citing only two MacRoberts & MacRoberts’ articles, one of them a review, because A. those are the ones I’ve read and B. I’m too lazy to read and cite all the research they refer to. That’s okay for informal scientific literature. However, if this was a peer-reviewed article, all the authors and articles not individually cited would have lost a citation. There’s a reason review articles are cited so often.
  • No informal citations. Those important conversations you had with your dissertation advisor or in a conference over lunch are forever gone, even though you might have gotten some of your best ideas from them. The paper you’ve been impressed with but couldn’t find a place to cite suffers the same fate. To quote MacRoberts and MacRoberts (1996) again: “If one wants to know what  influence has  gone into  a particular bit of research,  there is  only  one  way  to  proceed:  head  for  the  lab  bench,  stick  close  to  the  scientist  as  he works  and  interacts  with  colleagues,  examine  his  lab  notebooks,  pay  close  attention  to what  he  reads,  and  consider  carefully his  cultural  milieu.” They’re right, but their suggestion is hardly practical. That is why, in the last few years, bibliometricians have been trying to come up with metrics of academic social media cites. As the Altmetrics manifesto (2010) says “…that dog-eared (but uncited) article that used to live on a shelf now lives in Mendeley, CiteULike, or Zotero–where we can see and count it.” Unfortunately, Altmetrics indices are still far from accurate (not that citation indices are, but we’re stuck with them). If we’re to add new metrics to the mix, they better be good.
  • Limited databases. I mentioned it before in this blog, but it’s worth repeating: citation databases are painfully limited to a fraction of scientific publications, most of the covered ones being peer-reviewed journals. I have six Google Scholar citations for my blogs characterization article, but only two in Scopus. That’s one of the reasons your GS indices are usually higher than your Web of Science and Scopus ones. My dissertation advisor, Prof. Mike Thelwall, has an h-index of 47 in GS, 31 in Scopus, and 25 in WoS. All are correct, all are wrong. It depends on the coverage and the speed of update.
  • The Matthew Effect – or “the rich get richer.” People tend to cite already well-cited material by well-known researchers, either because that’s what they’ve read, because they’re appealing to the authority of the better known, or both.
  • Multiple motives -  there are multiple motives for citations, many of them have less in common with “giving credit where credit is due” than we would like to think.


I'm afraid I have a few additions to Dunleavy's and Shema's lists:


  • Link rot, where URL references are removed over time.
  • Link change, where URL references are changed over time, making it difficult to ascertain the original citations validity and usefulness.
  • All citations are treated as being the same, whether they are linked to support the text, show objections or simply add further information.
  • Citations in documents are free-floating, walled off in (little bracketed worlds), not tagged onto the text they are supposed to refer to, making alternative views and analysis of citations within documents poorer.
  • Multimedia citations are even more unrelated to the media, simply added as captions, not tags on the media itself.
  • Addresses documents, not people. It should be possible to directly relate to people, who the conversation is with directly, not only through their work.
  • & more. Please don't hesitate, send me your citation issues.



future: citation analysis


A further benefit of having the citations refer to specific text and have more relevant properties for digital information means that we can develop systems for deeper link analysis, but that is for the future.



The essence of my hypothesis is that the modern human mind evolved from the primate mind through a series of major adaptations, each of which led to the emergence of a new representational system.


Each successive representational system has remained intact within our current mental architecture, so that the modern mind is a mosaic structure of cognitive vestiges from earlier stages of human emergence. . . .


The key word here is representation.


Humans did not simply evolve a larger brain, an expanded memory, a lexicon, or a special speech apparatus; we evolved new systems for representing reality.


Merlin Donald