Data

Data Sources

Where our India PIN code data comes from and how we process it

Primary Data Source — GeoNames

GeoNames is a free geographical database covering all countries of the world, available under the Creative Commons Attribution 4.0 License.

We use the India data file (IN.txt) from GeoNames which contains postal codes, place names, administrative divisions, and geographic coordinates for all locations in India.

  • 📄 Format: Tab-separated values (TSV)
  • 🏳️ Country: India (IN)
  • 📊 Records: ~155,000+ rows
  • 🔄 Updated: Periodically by GeoNames
  • 📜 License: Creative Commons Attribution 4.0

Data Fields

Each GeoNames record for India contains the following fields that we use in our import pipeline:

Field Description Used For
country_codeAlways "IN" for IndiaValidation
postal_code6-digit PIN codePIN Code records
place_nameArea / locality nameArea locations
admin_name1State nameState hierarchy
admin_name2District nameDistrict hierarchy
admin_name3City/Taluk nameCity hierarchy
latitudeGeographic latitudeCoordinates
longitudeGeographic longitudeCoordinates

Our 6-Stage Processing Pipeline

Raw GeoNames data is processed through a 6-stage pipeline before being stored in our database:

1
Parse

Generator-based TSV parsing — reads the IN.txt file line by line with minimal memory usage. Handles encoding edge cases and malformed rows.

2
Clean

Normalises Unicode characters, expands abbreviations, trims whitespace, and standardises coordinate formats.

3
Validate

Checks PIN code format (6-digit, valid India range), verifies coordinates are within India's geographic bounds, and flags invalid rows.

4
Slugify

Generates URL-safe slugs for all location names using PHP's Intl transliteration and custom rules for Indian place names.

5
Build Hierarchy

Creates or retrieves State, District, City, and Area location records with correct parent-child relationships. Avoids duplicate insertions using get-or-create logic.

6
Map Relationships

Creates Area ↔ PIN Code mappings in the area_pincode_map table. One PIN code can serve multiple areas, and one area can have multiple PIN codes.

Attribution

Data is provided by GeoNames under the Creative Commons Attribution 4.0 International License.

GeoNames data may not be 100% accurate for every location. If you find an error, please report it to GeoNames directly or contact us through the admin panel.