SuttaCentral update

We’ve just gone live with the new update for SuttaCentral. Check it out!

For this edition you won’t notice much change on the front end. But that is as planned. When we finished the newly designed site, our IT monks said, “Actually, now we know how to do it properly…” They (being Vens Jhanarato and Nandiya) found working with the previous incarnation, written in PHP, to be cumbersome, and they love Python. So once we finished, they started rewriting the whole thing in Python 3. This was a fantastic contribution, as it means the whole site is streamlined and optimized under the skin to a far greater degree. Running benchmarks on the full Samyutta Nikaya parallels table, we rendered the entire page on the old version in 38 seconds, while the new version stomps in at 1.5 seconds.

But it’s not just about speed. Python is the preferred language for natural language processing (NLP), and is heavily used by Google for that reason. This has already allowed us to create a fantastic, rich search engine: check it out! We already have title search, multiple full text searches, dictionary lookup, and even, for place names, automatic generation of locations on Google Maps. This is only the beginning of some powerful tools for analysis and comprehension of the texts.

More prosaically, we have resolved the former problems with Vietnamese fonts, and in fact have sponsored the design of a special version of Skolar font for Vietnamese. I’ve worked with the font designer, David Březina, to create the special Vietnamese diacritical marks. This is part of our push to use the highest possible standards of web typography in the demanding, multi-language environment of Buddhist texts. These are the words of the Buddha, and they deserve the best.

In addition, there are several new translations online, including additions to the Digha Nikaya and Samyukta Agama. However, our in-house translations are still limited. I hope to complete the corpus of original language texts this year, including Vinaya and Abhidhamma in all versions. Once this is done we can focus on gathering the translations.

25 thoughts on “SuttaCentral update

  1. Dear Bhante,

    I quite like this new one. However, there is one thing your new update cannot do: tell me what Sutta I’m looking for!
    :D :D :D

    With much respect,

    Dheerayupa

    • Excellent idea! Even better, in fact, would be to find, not which sutta you’re looking for, but which sutta you need. Give us, maybe, ten years?

    • Dear Bhante,

      On a serious note (apology for teasing you and your team in my previous comment :D), I’m wondering if it is possible to do something similar to ‘Google search’. That is, we could type some key words and we will get a list of suttas the content of which contain our key words.

      I did try ‘Jhana’ and the result was a list of suttas that have the word ‘jhana’ as part of their names.

      With gratitude to you and your team,

      Dheerayupa

    • Thank you so much, Benny!

      I regard the Indonesian translations as of great importance, and have had several discussions about it. We definitely intend to incorporate them on SuttaCentral, so that Indonesians can get the benefit of all the great tools we are developing. Until the end of the year, however, we will mainly be focussing on adding the original texts, and also to extend our coverage to include the Vinaya and Abhidhamma. When that is done we will turn more attention to the localized translations.

      One of the big challenges for us is dealing with so many texts in different languages. We can handle English, Pali, and Sanskrit pretty well, and some of our team can handle Chinese, but then there’s Bahasa, Thai, Sinhala, Korean, Vietnamese… It would be great to have volunteers to help with the various languages. Actually, it’s usually not so hard to find someone with the language skills, but what we really need is someone with A) language, B) sutta knowledge, and C) IT skills. And that’s a pretty rare combination. Anyway, I just mention it for you to bear in mind for now. Perhaps towards the end of the year we can look to taking the next step.

      metta

      Bhante Sujato

    • Dear Bhante,

      I wish to volunteer my service to your team regarding the Thai version. However, as you said, persons with all the skills required are rare to find. I know Thai, and have limited knowledge of suttas and IT. :( I could read more suttas (when I find some free time from my four book projects :D ) but what kind of IT skills do I need to develop to be able to help you? I would love to do that before I meet you and your team in Perth in late November.

      With respect,

      Dheerayupa

    • Thanks so much, Dheerayupa. But I thought you were already on the team!

      I probably shouldn’t get too much into this, as, like I said, our real push for the translations will be probably next year. But here are the basics of what we need to do.

      In dealing with the canonical texts translated in various languages, we are confronted with thousands of texts. Each of the different translations has its own quirks, and each has its own system of encoding. Almost invariably, we find that the texts we use do not follow the structures of proper HTML, like using “h” tags for heading and “p” tags for paragraphs. The files are usually either cluttered with masses of junk code, or they are basically just text files with little internal structure. So we have to take the files as we find them and transform them into SuttaCentral’s HTML template.

      To do this requires not only an understanding of HTML, but the ability to use the command line to manipulate thousands of files at once. We use tools like Beautiful Soup, HTML Tidy, and others we have developed ourselves. For example, we have one called “emdashar” which automatically corrects and standardizes punctuation across multiple HTML files. We have also developed a method for semi-automatically adding correct Pali diacritical marks to translated texts in any language. As time goes on these tools will be refined and developed further.

      Most of the heavy lifting is done by Ven Nandiya, while I do the more small scale tweaking of individual files. We are, of course, happy to offer support for whoever wants to help.

      All of these tools are very powerful, which means that they can be easily misused, and can end up making unwanted changes to the texts, so we have to be very careful how they are applied.

      Cleaning and templating the files is only part of the job, however. We also need to ensure that they are numbered correctly. For DN and MN this is simple, but for the shorter suttas, Vinayas, and so on, there are several inconsistent sets of numbering for texts, and we need to get everything in line. One of the most powerful features of SuttaCentral is our URL structure: one simple URL for one sutta. So each sutta needs to be broken down to one file, with a simple name: “mn1/th”, for example, would be the Thai translation of Majjhima Nikaya 12 Mulapariyaya Sutta. This becomes complex when we have to handle the repetition series and other situations where the notion of a “sutta” start to become fuzzy.

      A further aspect, which is less fundamental but still important, is to coordinate the internal references in the files. Usually sutta translations will have various numbering systems for references. You can see these numbers in, say, Ven Bodhi’s books. These refer to different things, typically the Volume and page numbers for the PTS Pali edition, or in the case of Chinese texts, to the Taisho. Each of these referencing systems has to be marked up, not in plain text, but as metadata. In the case where the references are missing, they should be added. This will then allow us to integrate the suttas at deeper and deeper levels. For example, you should be able to refer easily from the paragraph numbers in Ven Bodhi’s editions to the Pali; but there is no Pali text that currently allows this. Another usage will be to automatically use, say the Volume and page numbers supplied in the dictionary to automatically open the correct paragraph of the sutta. At a further stage in development, this will enable us to do various kinds of language processing across multiple texts.

      Anyway, I should really stop there! Like I said, it’s not easy to find people with the skills. But that’s the fun!

    • Dear Bhante,

      I will print out your comment and discuss it with an IT friend. At the moment, what you said sounds intimidating to me, a pen-and-paper person. :D However, I hope that when being explained in ‘Thai’, not ‘Greek’, I might be able to understand and assess my ability (or potential). :D

      I hope that when I see you in November, I will have more knowledge to discuss with you what I could do for SuttaCentral. :)

      With gratitude,

      Dheerayupa

    • Thanks, Dheerayupa. To be honest, it should be intimidating: it’s not a simple job! But like I said, it will take a few months before we are ready to really focus on the translations, and in the meantime it doesn’t hurt to ask around and see what’s what.

    • Thanks Bhante,

      I looking forward. I hope I qualified enough. Personally I have my zeal towards finding out and compiling the early buddhism materials by having the Nikayas translated first and going to other texts as well. I think SuttaCentral would be the best and effective way to do this.

      Deep bow,

  2. Awesome effort. A female input – especially a female writer, (dheerayupa?) would make it more communicatable to a wider audience.

    • Dear Utopia,

      I believe that in the Utopian land, a female could contribute great input. However, for a person who still reads paper books, instead of e-books, I can only hope that I won’t cause more headache to Bhante! :D

  3. Are Monks allowed to take panadol?…and hopefully make it “user friendly” for those who don’t even get round to read any books :) let alone e-books or thousands of suttas

  4. Hi,

    Even still, I doubt I have time to read all these texts in this life at my age – so will try just listening to Ajahn Brahm (and Ajahn Brahmali’s) versions of the texts to at least get some way in this life – hopefully close to nothingness and even Utopia has its problems. :)

    Utopia

  5. Hi

    Obviously Ajahm Brahm’s teachings though cover the whole path in a challenging but at the same time enjoyable way, encompassing, simplifying and relating and living the texts, relevant to the present but based on the path to enlighenment – buddha’s teachings.

  6. Dear Bhante,

    Many many thanks so much to you, Vens Jhanarato and Nandiya and everyone else working on this ‘behind the scenes’, including of course all the translators & other supporters. It’s beyond amazing….and not only a much needed resource but user-friendly too, even for a technically-challneged luddite like me!

    Dheerayupa, glad to hear I’m not the only one to read actual physical books, which is all I read (or printed out pdf files), and I write with a pen and paper, except when I want to do e-mail or something like this…. :-)

  7. Not all people interested in Buddhism are really technical people- actually it use to just be the domain of the drop-outs, hippies, and losers and old people not geeks or of the high achieving young people (apparently this kids are even taking cognitive enhancing drugs to get through school these days – the pressure is so high to be perfect and achieve everything by the time you are 5 years of age).

  8. Maybe for your next project you could get Ajahn Brahm to write up another version of the Suttas to go with that collection. Wouldn’t take long!

  9. Bhante/Ajahn/Ayya

    When I typed in a search requirement the net came up with something to do with Australian Buddhist History.

    I was never really quite sure what Hinayana referred to; that is whether it referred to Theravarden Buddhism or was just name for a certain type of person, which it possibly is and was never meant to describe Theravarden Buddhism as obviously this term is not correct for Theravarden Buddhism – as it is quite a derogatory term.

    The search though listed: Hinayana/Theravarden and the general definition of Hinayana (as stated on wikepedia) is therefore incorrect if it is referring to Theravarden Buddhism.

    Possibly some scholarly people might no how to correct this mistake on the internet

    Regards

    • Hi Rnik,

      Theravada means “The School of the Elders”. It is the name used by the Buddhists of south and soouth-east Asia to refer to themselves. Historically it was a term for the group of schools that formed in distinction to the Mahasanghikas around the second century BCE. The only surviving one of these dozen or so schools is the branch that settled in the Mahavihara in Anuradhapura, Sri Lanka, and it is from them that modern Theravadins derive their lineage.

      Hinayana means “Inferior Vehicle”. It is a derogatory term that was used in some Mahayana texts to criticize the followers of the Sarvastivada and other early schools that no longer exist; or, more generally, as a derogatory term to put down anyone they were criticizing at the time. It does not mean “Small Vehicle” (which would be cūlayāna). Hīna means “rubbish”, “inferior”, “crappy”, and so on.

      Until modern times it has never been used by any school of Buddhism to refer to itself. In the 20th century some Theravadins, not understanding the meaning of the term, and some scholars, who should have known better, began using it to refer to the Theravada. In 1950 it was agreed by all present at the World Fellowship of Buddhists that the term was insulting and should not be used to refer to any existing school of Buddhism. These days sensitive Mahayanists such as the Dalai Lama always avoid the term, and use the more historically accurate and neutral word “Shravakayana” (which means “vehicle of the Disciples”) to refer to the early Buddhist schools in general.

    • Dear Bhante,

      “In 1950 it was agreed by all present at the World Fellowship of Buddhists that the term was insulting and should not be used to refer to any existing school of Buddhism.”

      It seems that words travelled slowly in those days :D since I remember that in the late 60s, when I was in school, we called our Buddhism ‘Hinayana’. ;)

      With respect

      Dheerayupa

  10. Bhante/Dheerayupa,

    Thank you for that, but I was given the impression that all schools that were not of the “two higher vehicles” mahayana and vajrayana were the lower vehicle (putting it mildly). This is untrue.

    Lets hope in light of you most recent post above that these Schools put that nonsense in the past were it belongs and follow the Dhamma as it was taught or according to the Buddha or the whole teachings not just parts so as to create good karma for Buddhism.

    (re your post above – how do these people get to stay in this country when other more needy or deserving people are kept out or sent back ie recently a female doctor was not granted another visa because at approx 50 she was considered too old) – Regards

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s