|Project||WebHelp Output for DocBook|
|Student Name||Kasun Gajasinghe|
|Mentors||David Cramer, |
I am passionate about Open Source World and love contributing to free software. I hope that Google Summer of Code will be a great opportunity for me to become part of another open source community, contribute for the development of the project, make new friends, and develop new skills. I believe that GSoC will be a excellent starting point for this.
DocBook is a leading format for documentation and is especially popular with Open Source projects. So, I am particularly interested in DocBook and hope to become a permanent member of DocBook project.
I have planned to devote 35-40 hours per week for this project.
I have researched with suggestions from my co-mentors on ways to implementing client-side searching and came up with with following options.
Use Lucene QueryParser
Use the Java Indexer of the htmlsearch demo plugin as a base and add needed features
Java indexer is a good starting point. It does basic indexing and stores it in js files with keys (words) and their relevant file names. Then, it does basic searching based on given key words. This could be used as the base and improve the code and add new features. I have downloaded the source code and studied it. The proposed enhancements are listed under Proposal section.
For Table of Content tree generation, Considered,
Frameset approach with the tree included in a separate file.
Generate complete toc for every generated files and make it appear to be a pane
DocBook is a set of standards and tools for technical documentation. It was initially and is primarily intended for technical documentation, but has been extended for use in other domains. The current DocBook schema is available in several languages including RELAX NG and DTD and is maintained by the DocBook Technical Committee of OASIS. The DocBook Open Repository is a project hosted on SourceForge that maintains a set of XSL stylesheets for converting a DocBook instance into a variety of output formats, including various html formats, pdf (via XSL-FO), man pages. The currently supported html output formats include monolithic html, chunked html, Microsoft HTML Help (.chm), Eclipse documentation plugins, and Java Help.
Search is done in client-side. For that, I plan to use the “htmlsearch1.04” demo plugin from DITA Open Toolkit as a base and enhance it with the needed features. As DocBook is included as one of their supported products, it will be compatible for this project. Further, it's license allows the use of it in commercial applications as well.
The enhancements currently planned are,
- Support for stemming and lemmatization for a given query
- Search with Boolean operators (AND, OR)
- Meta-data such as 'Prev' and 'Next' in the content page will be ignored when indexing.
- Improve support for Asian Languages (Japanese and other Asian languages, meta tag content is used.)
- As searching in client-side may slow-down the application, necessary optimization will be adopted.
I plan to use YUI library for the TOC tree generation. I will abandon the frameset-based approach and instead use a CSS-based mechanism in which the TOC is generated in every page and CSS is used to properly format it for viewing. With this approach, synchronization with the content file happens automatically. Further with this mechanism deep-linking happens automatically.
UI design will be developed using CSS and other technologies and will be little similar to Eclipse Help.
The Planned development schedule is given below.
|Community Bonding Period: April 26 - May 24||Get to know the mentor and the community|
Study the required API and features for WebHelp
Preparing the development environment
Look for a good searching approach.
Start designing a good model
|Interim Period: May 24 - July 12||Dividing the development process into stages with the help of the mentor|
Developing the TOC tree using a CSS-based mechanism (YUI)
Implementing the synchronization with the content
Adding an index with the help of DocBook schema
Designing client-side search mechanism with all the things such as stemming and lemmatization into consideration and start coding.
Designing a better user interface.
|July 12 - July 16||Submitting mid-term evaluations and continue with the development|
|Interim Period: July 16 - August 9||Completing TOC with synchronization|
Continue developing the search mechanism
Testing the synchronization and searching
Developing the User Interface
|August 9 - August 16||Refine the code and testing the code and doing necessary improvements.|
|August 20||Final evaluation deadline|
|August 30||Submitting required code to Google|
 My Blog: Kasun's Tech Thoughts http://kasunbg.blogspot.com
 Twitter: http://twitter.com/kasunbg
 My Google Code Hosting Profile
(Projects hosted: documentation-aggregation-application, KFinder:A file searcher, cse-checkers(Java), cse-l3-2009-070137m:A Firefox extension)
 DocBook 5.0: The Definitive Guide
 DocBook XSL: The Complete Guide
 dita-users · DITA users yahoo group
 YUI Library
 Documentation Aggregation Application
 Delicious Extension for Google Chrome
 University of Moratuwa, Sri Lanka
 Deparment of Computer Science & Engineering, Faculty of Engineering