Read The Docs uses Elasticsearch instead of the built in Sphinx search for providing better search results. Documents are indexed in the Elasticsearch index and the search is made through the API. All the Search Code is open source and lives in the GitHub Repository. Currently we are using the Elasticsearch 6.3 version.
Local Development Configuration¶
Installing and running Elasticsearch¶
You need to install and run Elasticsearch version 6.3 on your local development machine. You can get the installation instructions here. Otherwise, you can also start an Elasticsearch Docker container by running the following command:
docker run -p 9200:9200 -p 9300:9300 \ -e "discovery.type=single-node" \ docker.elastic.co/elasticsearch/elasticsearch:6.3.2
Indexing into Elasticsearch¶
For using search, you need to index data to the Elasticsearch Index. Run
For performance optimization, we implemented our own version of management command rather than the built in management command provided by the django-elasticsearch-dsl package.
By default, Auto Indexing is turned off in development mode. To turn it on, change the
ELASTICSEARCH_DSL_AUTOSYNC settings to
True in the
After that, whenever a documentation successfully builds, or project gets added,
the search index will update automatically.
The search architecture is devided into 2 parts. One part is responsible for indexing the documents and projects and the other part is responsible for querying the Index to show the proper results to users. We use the django-elasticsearch-dsl package mostly to the keep the search working. django-elasticsearch-dsl is a wrapper around elasticsearch-dsl for easy configuration with Django.
All the Sphinx documents are indexed into Elasticsearch after the build is successful. Currently, we do not index MkDocs documents to elasticsearch, but any kind of help is welcome.
How we index documentations¶
After any build is successfully finished,
HTMLFile objects are created for each of the
HTML files and the old version’s
HTMLFile object is deleted. By default,
django-elasticsearch-dsl package listens to the
to index/delete documents, but it has performance drawbacks as it send HTTP request whenever
HTMLFile objects is created or deleted. To optimize the performance,
bulk_post_delete signals are dispatched with list of
HTMLFIle objects so its possible
to bulk index documents in elasticsearch (
bulk_post_create signal is dispatched for created
bulk_post_delete is dispatched for deleted objects). Both of the signals are dispatched
with the list of the instances of
We listen to the
bulk_post_delete signals in our
and index/delete the documentation content from the
How we index projects¶
We also index project information in our search index so that the user can search for projects
from the main site. django-elasticsearch-dsl listen
post_delete signals of
Project model and index/delete into Elasticsearch accordingly.
ProjectDocument: It is used for indexing projects. Signal listener of django-elasticsearch-dsl listens to the
Projectmodel and then index/delete into Elasticsearch.
PageDocument: It is used for indexing documentation of projects. By default, the auto indexing is turned off by
ignore_signals = settings.ES_PAGE_IGNORE_SIGNALS.
Falseboth in development and production. As mentioned above, our
Searchapp listens to the
bulk_post_deletesignals and indexes/deleted documentation into Elasticsearch. The signal listeners are in the
readthedocs/search/signals.pyfile. Both of the signals are dispatched after a successful documentation build.
The fields and ES Datatypes are specified in the
PageDocument. The indexable data is taken from
HTMLFile. This property provides python dictionary with document data like