Bringing the South Asian Heritage to the Internet: The Story Behind the Project Heritage GLAM

Imagine finding one of the kind treasure-troves in the closed shelves of the libraries — from the first Punjabi encyclopedia and lexicography works of Guru Shabad Ratnakar Mahan Kosh by Bhai Kahn Singh — to the biographies of Bhai Gurdas and Bhagat Singh by Harinder Singh Roop and Triloki Singh — to collections by Nobel Prize Laureate Rabindra Nath Tagore in Punjabi and Shahmukhi languages, what would happen if accessing these works was opened to everyone on digital platforms? What if everyone was allowed to freely read them, download them, and share them? Such a shift from shelves-to-digital change is needed to protect important works from being lost forever with time.

GLAM (Gallery, Libraries, Archives, and Museums) institutions have been the well-grounded keepers of our cultural and historic knowledge for centuries, both as our inheritance from the past and as our legacy to the future. They form a critical essence of the intellectual capital of the information society.

Open Heritage Foundation, a nonprofit organization in India, has been partnering with local GLAM institutions in India to digitize notable works of historical and cultural value to bring them on open platforms, such as Wikimedia Commons, Wikisource, and Internet Archive.

In the post-pandemic world, it has become more critical than ever to bring online the rich heritage from cultural institutions to facilitate access from researchers and readers. These materials can serve educational, research, and entertainment purposes alike, and are critical to bridging the digital divide on the internet.

This article explores our story on how we discovered rare Punjabi cultural works from cultural institutions and brought them to large communities of users in digital format under free licenses and tools.

Democratizing the Knowledge-Economy

According to the research of Wikipedia Cultural Diversity Observatory (WCDO), the most searched content online and top read articles on Wikipedia in a particular region are about the local cultural context content from that region.

For South Asian regions, one of the major challenges to do research and create a well-cited encyclopedia in the local languages is the lack of online access to relevant local publications that can be used as sources for well-researched articles.

To make these non-digital sources from local cultural heritage and indigenous knowledge systems available, we formed a partnership with an old government municipal library and started a project called Heritage GLAM to digitize the rare works dating back to the sixteenth century.

Aisles of Municipal Library Patiala, Wikilover CC By S.A. 4.0.

Partnerships Framework Designs with Cultural and Knowledge Institutions

We started by making a GLAM-Wiki partnership with RVJD Library, one of the oldest libraries in Punjab from the 19th century built during the British Colonial Era. The library holds a treasure trove of thousands of rare manuscripts, classical first editions, and rare collections of historical works in Punjabi, English, Hindi, Sanskrit, Urdu, and Shahmukhi literature.

DIY Scanner by Wikilover90, CC BY-SA 4.0 via Wikimedia Commons

Underlying Challenges

With the ongoing work on the library, we have been able to unravel various copyright-related challenges that complicate bringing works to the internet. The works of authors who died sixty years ago are public domain, according to the Indian Copyright Act of 1957. This is the law that currently governs copyright and was the first post-independence copyright legislation in India. For the Heritage GLAM project, we had to do copyright analysis, which allowed us to learn about the fundamental challenges that prevail in the current copyright system.

Many old works are missing important bibliographical information about the date of publication, the publishing house, and the author of the original work. The only thing to go on for many of such works is their mentioned cost, their font style, the quality of the binding, and paper type used in the making of the work.

Although the works are probably old enough to be public domain, copyright restricts and impairs such works to be on free platforms that can bring them to global readers.

Another interesting exploration with the medieval and post-modern works surfaced the usage of the ancient desi Indian calendar, which is different from the currently used Gregorian calendar. The calculations of years and time were different in Desi calendars, some of which followed the Tropical year (365 Days, 5 Hours, 48 Minutes, 45 Seconds) in place of the Sidereal year.

Many works state publication dates in the Savant year, that started before the birth of Guru Nanak, also called the Nanakshahi calendar, Panchanga calendar, ancient Indian Bikrami (Vikrami) calendar, and the Arabic Hijri calendar. The Indian national calendar also called the Shalivahana Shaka calendar has been used alongside the Gregorian calendar in India officially since 1 Chaitra 1879, Saka Era, or 22 March 1957. Based on scientific study, the Calendar Reform Committee under the guidance of the Council of Scientific and Industrial Research recommended an accurate calendar which could be adopted uniformly throughout India after a detailed study of thirty different calendars prevalent in different parts of the country which were complicated even more due to religion and local cultural sentiments attached with their origins. The learnings from our explorations left us with an in-depth understanding of how the copyright of such works was impacted, which was introduced quite later in the late fifties in India. For calculating the correct publication date to determine the copyright status of such works, the desi calendar dates were converted to the modern Gregorian calendar.

Publishing Works on Free Knowledge Platforms

When one thinks about the best ways to widen the reach of such heritage resources, the tried and trusted open-source platforms such as Internet Archive and Wikisource come to mind. These platforms are open and free for individuals and organizations to share and access works.

Internet Archive is an open-access e-Library that provides free public access to collections of digitized materials. While the benefits of having an open-source library are immense for accessibility, two of them are better than one. There is another excellent free e-library that provides readers open access to freely licensed or public domain books, in different formats that people can read, download, and use for any purpose. This library is called Wikisource and it is available in seventy-one languages, and it is also one of thirteen collaborative knowledge projects operated by the Wikimedia Foundation.

A still from manuscript Prem Hind, Public domain, via Wikimedia Commons

Wikisource is one of the user-friendly Wikimedia projects that allow users to have access to transcribed text from the original source image file. This means that if a high school student or a research scholar was to search online for sources about any subject or topic, they would be able to locate it via the text format of these works directly from the internet search engines. One of the important platforms for community engagement, Wikisource allows for volunteers to transcribe, validate, and transclude the books from an image format. And OCR (Optical Code Reader) has made tremendous growth in the support of transcription work. Tools like IndicOCR, a new tool that helps to easily transcribe any Indic language to Wikisource. (It replaced an older Linux-based tool that could not be used on many devices.)

From Puratan Janam Sakhi to Mahan Kosh, the famous lexicography work of Punjabi Sikhism to Chambe Diyan Kaliyan, a short story collection by Leo Tolstoy in Punjabi, our digitization adventures with our GLAM partners have allowed us to bring these rare works and manuscripts available online on Internet Archive and Wikisource websites.

The Synergy of Campaigns and Community

How can we connect the communities and the knowledge users with cultural heritage? One excellent way is to start a “campaign”! What can be better than engaging communities to promote Open Education Resources indeed? And there started our journey with the Wikisource Proofreading Contest and #1Lib1Ref campaign.

One of the special features of the Wikisource project is to match the source images of the file with searchable text that is created and vetted by the editors. This is done in several steps. First, the image file is run through an OCR (Optical Character Recognition) engine, which is then proofread by editors to remove the machine errors left by the OCR. Afterward, it is checked again and vetted by editors with the validation of the index pages, and finally transcluded. After this, the text is made available in a downloadable format.

With the Wikisource Proofreading Contest and numerous outreach programs like edit-a-thons and training workshops, we have been able to train interested volunteers to edit Wikisource and proofread the digitized works in Wikisource. Just in the past year, we have been able to get 21,000 pages vetted and validated by over fifty editors in Punjabi Wikisource.

1 library 1 reference is a pre-existing, global Wikimedia Campaign that is organized by The Wikipedia Library project every year around Feb and May. Our participation from the Punjabi Wikisource project in this campaign has allowed us to integrate useful references from these digitized resources to enrich the Wikipedia articles with resourceful citations.

Bibliography Metadata via Linked Open Data

Creating Dataset Schema in Open Refine 6 by Wikilover90, CC BY-SA 4.0 via Wikimedia Commons

Structured metadata makes bibliography databases more accessible and visible on the Internet. Thanks to Wikidata, another project by the Wikimedia Foundation, the general public can access a free and open knowledge base for the structured data available under a free license, which can be interlinked to other open data sets.

Our Project Wikidata: Wikiproject Punjabi Authors and Wikiproject Punjabi Books is one such initiative for document the bibliographical metadata to include important information about the books, including the title of the work, author, birthplace of the author, language of the work, publisher, painter, illustrator, publication date, place of publication, and the permission which shows the copyright status of the work.

Open Access Initiatives with Contemporary Artists and Authors

While older works have important historical, cultural, and educational value, it is important to encourage modern works to be made freely available on open platforms, so they can have a wider audience.

To bring the contemporary artwork and related works available to these open platforms is important to do some advocacy with cultural institutions, the artist and the artistic community. For us, it is important that they can see some of the benefits to re-license some of their important works with Creative Commons licenses, so they can be made open to global audiences free of cost. The Access to Arts Conference was organized for Indian artists in one such endeavor where they learned about open access, Creative Commons licenses, and the ways in which they can get involved with the Open community.

After this conference, some notable artists including Diwan Manna and Gursharan Kaur relicensed their collections under open licenses. Diwan Manna, national academy award-winning Indian conceptual artist and photographer, relicensed his collection “Shores of the Unknown” and donated his work via Wikimedia Commons under a CC-BY-SA 4.0. Gursharan Kaur, another famous Punjabi author, relicensed her collection on “History of Sikh Gurus” and “Punjabi Folklore songs of Malwa region” under a CC-BY-SA 4.0.

Shores of the Unknown New 13 by Conceptual Artist Diwan Manna, CC BY-SA 4.0 via Wikimedia Commons


We have only started on the journey of making cultural knowledge accessible with digitization and activism on open source platforms. We started with the pieces of knowledge that are closest and most easily accessible to us, but we want to expand that scope. Our efforts have been documented in a way that they can be used as examples for local communities around the globe.

Our efforts are mostly limited by the lack of copyright information about old works, funding resources, and the difficult working environment of some of the old libraries.

In spite of these limits, we have accomplished digitizing and transcribing around 22,000 pages from 300 works of old Punjabi literature to make them openly available through free knowledge platforms.

We hope to expand our reach through more campaigns, resources support, and partnerships to accomplish the digitization of our partner institutions to salvage our remaining heritage from these cultural institutions and immortalize them through open access platforms over the Internet, so the generation of today and tomorrow can continue to have access to their past history and culture.

Rupika Sharma is the Founder and Strategic Director of the Open Heritage Foundation. She envisions a world where the cultural heritage of the Global South can be preserved and made freely accessible online. When she is not advocating for open movement, you can find her reading novels, sipping caramel Frappe and enjoying K-Pop and Hispanic music.

About this story

This story was written thanks to an open call funded by Creative Commons Open GLAM Platform. This is part of a series of articles that will be published in the Open GLAM Medium publication, that have been supported with the goal of showcasing stories around the world on Open GLAM. Find out more here.

This blog was posted on Medium as Bringing the South Asian Heritage to the Internet: The Story Behind the Project Heritage GLAM on Nov 14, 2020

Leave a Reply

Your email address will not be published. Required fields are marked *