Machine Learning Datasets Machine Learning Datasets
  • GitHub 
  • Slack 
  • Documentation 
Get Started
Machine Learning Datasets Machine Learning Datasets
Get Started
Machine Learning Datasets
  • GitHub 
  • Slack 
  • Documentation 

Machine Learning Datasets

  • folder icon closed folder iconDataset Visualization
  • Storage & Credentials
  • API Basics
  • Getting Started
  • Tutorials (w Colab)
  • Playbooks
  • Data Layout
  • folder icon closed folder iconShuffling in ds.pytorch()
  • folder icon closed folder iconStorage Synchronization
  • folder icon closed folder iconHow to Contribute
  • Datasets
    • Speech Commands Dataset
    • 300w Dataset
    • Food 101 Dataset
    • VCTK Dataset
    • LOL Dataset
    • AQUA Dataset
    • LFPW Dataset
    • ARID Video Action dataset
    • The Street View House Numbers (SVHN) Dataset
    • NABirds Dataset
    • GTZAN Music Speech Dataset
    • Places205 Dataset
    • FFHQ Dataset
    • CARPK Dataset
    • SQuAD Dataset
    • CACD Dataset
    • ICDAR 2013 Dataset
    • RAVDESS Dataset
    • Flickr30k Dataset
    • dSprites Dataset
    • Kuzushiji-Kanji (KKanji) dataset
    • PUCPR Dataset
    • KMNIST
    • EMNIST Dataset
    • GTSRB Dataset
    • Free Spoken Digit Dataset (FSDD)
    • USPS Dataset
    • CSSD Dataset
    • MARS Dataset
    • ATIS Dataset
    • HICO Classification Dataset
    • COCO-Text Dataset
    • NSynth Dataset
    • not-MNIST Dataset
    • CoQA Dataset
    • RESIDE dataset
    • ECSSD Dataset
    • FGNET Dataset
    • Electricity Dataset
    • DRD Dataset
    • Caltech 256 Dataset
    • AFW Dataset
    • ESC-50 Dataset
    • HASYv2 Dataset
    • Pascal VOC 2012 Dataset
    • PACS Dataset
    • GlaS Dataset
    • QuAC Dataset
    • TIMIT Dataset
    • WFLW Dataset
    • LFW Deep Funneled Dataset
    • UTZappos50k Dataset
    • Visdrone Dataset
    • 11k Hands Dataset
    • KTH Actions Dataset
    • LFW Funneled Dataset
    • WIDER Face Dataset
    • LFW Dataset
    • Pascal VOC 2007 Dataset
    • Chest X-Ray Image Dataset
    • PlantVillage Dataset
    • Office-Home Dataset
    • WISDOM Dataset
    • Omniglot Dataset
    • DAISEE Dataset
    • HMDB51 Dataset
    • Optical Handwritten Digits Dataset
    • Fashionpedia Dataset
    • UCI Seeds Dataset
    • STN-PLAD Dataset
    • WIDER Dataset
    • Caltech 101 Dataset
    • DRIVE Dataset
    • PPM-100 Dataset
    • FER2013 Dataset
    • LSP Dataset
    • Adience Dataset
    • NIH Chest X-ray Dataset
    • UCF Sports Action Dataset
    • CelebA Dataset
    • Wiki Art Dataset
    • FIGRIM Dataset
    • MNIST
    • COCO Dataset
    • Kaggle Cats & Dogs Dataset
    • ANIMAL (ANIMAL10N) Dataset
    • Image Hotspots Widget
    • ImageNet Dataset
    • CIFAR 10 Dataset
    • Lincolnbeet Dataset
    • CIFAR 100 Dataset
    • LIAR Dataset
    • OPA Dataset
    • Fashion MNIST Dataset
    • Sentiment-140 Dataset
    • Google Objectron Dataset
    • Stanford Cars Dataset
    • DomainNet Dataset
    • MURA Dataset
    • SWAG Dataset
    • HAM10000 Dataset
    • GTZAN Genre Dataset
    • Tiny ImageNet Dataset
  • folder icon closed folder iconTensor Relationships
  • folder icon closed folder iconDeep Lake Docs Home
  • folder icon closed folder iconQuickstart

Docy

Machine Learning Datasets

  • Folder icon closed Folder open iconDataset Visualization
  • Storage & Credentials
  • API Basics
  • Getting Started
  • Tutorials (w Colab)
  • Playbooks
  • Data Layout
  • Folder icon closed Folder open iconShuffling in ds.pytorch()
  • Folder icon closed Folder open iconStorage Synchronization
  • Folder icon closed Folder open iconHow to Contribute
  • Datasets
    • Speech Commands Dataset
    • 300w Dataset
    • Food 101 Dataset
    • VCTK Dataset
    • LOL Dataset
    • AQUA Dataset
    • LFPW Dataset
    • ARID Video Action dataset
    • The Street View House Numbers (SVHN) Dataset
    • NABirds Dataset
    • GTZAN Music Speech Dataset
    • Places205 Dataset
    • FFHQ Dataset
    • CARPK Dataset
    • SQuAD Dataset
    • CACD Dataset
    • ICDAR 2013 Dataset
    • RAVDESS Dataset
    • Flickr30k Dataset
    • dSprites Dataset
    • Kuzushiji-Kanji (KKanji) dataset
    • PUCPR Dataset
    • KMNIST
    • EMNIST Dataset
    • GTSRB Dataset
    • Free Spoken Digit Dataset (FSDD)
    • USPS Dataset
    • CSSD Dataset
    • MARS Dataset
    • ATIS Dataset
    • HICO Classification Dataset
    • COCO-Text Dataset
    • NSynth Dataset
    • not-MNIST Dataset
    • CoQA Dataset
    • RESIDE dataset
    • ECSSD Dataset
    • FGNET Dataset
    • Electricity Dataset
    • DRD Dataset
    • Caltech 256 Dataset
    • AFW Dataset
    • ESC-50 Dataset
    • HASYv2 Dataset
    • Pascal VOC 2012 Dataset
    • PACS Dataset
    • GlaS Dataset
    • QuAC Dataset
    • TIMIT Dataset
    • WFLW Dataset
    • LFW Deep Funneled Dataset
    • UTZappos50k Dataset
    • Visdrone Dataset
    • 11k Hands Dataset
    • KTH Actions Dataset
    • LFW Funneled Dataset
    • WIDER Face Dataset
    • LFW Dataset
    • Pascal VOC 2007 Dataset
    • Chest X-Ray Image Dataset
    • PlantVillage Dataset
    • Office-Home Dataset
    • WISDOM Dataset
    • Omniglot Dataset
    • DAISEE Dataset
    • HMDB51 Dataset
    • Optical Handwritten Digits Dataset
    • Fashionpedia Dataset
    • UCI Seeds Dataset
    • STN-PLAD Dataset
    • WIDER Dataset
    • Caltech 101 Dataset
    • DRIVE Dataset
    • PPM-100 Dataset
    • FER2013 Dataset
    • LSP Dataset
    • Adience Dataset
    • NIH Chest X-ray Dataset
    • UCF Sports Action Dataset
    • CelebA Dataset
    • Wiki Art Dataset
    • FIGRIM Dataset
    • MNIST
    • COCO Dataset
    • Kaggle Cats & Dogs Dataset
    • ANIMAL (ANIMAL10N) Dataset
    • Image Hotspots Widget
    • ImageNet Dataset
    • CIFAR 10 Dataset
    • Lincolnbeet Dataset
    • CIFAR 100 Dataset
    • LIAR Dataset
    • OPA Dataset
    • Fashion MNIST Dataset
    • Sentiment-140 Dataset
    • Google Objectron Dataset
    • Stanford Cars Dataset
    • DomainNet Dataset
    • MURA Dataset
    • SWAG Dataset
    • HAM10000 Dataset
    • GTZAN Genre Dataset
    • Tiny ImageNet Dataset
  • Folder icon closed Folder open iconTensor Relationships
  • Folder icon closed Folder open iconDeep Lake Docs Home
  • Folder icon closed Folder open iconQuickstart

CelebA Dataset

Estimated reading: 6 minutes

Visualization of the Celeb-A dataset in the Deep Lake UI

Celeb-A dataset

What is Celeb-A Dataset?

The CelebFaces Attributes Dataset (CelebA) consists of more than 200K celebrity images with 40 attribute annotations each. The images range from extreme poses to heavily background-cluttered backgrounds. Images cover large pose variations, background clutter, and diverse people, making this dataset great for training and testing models for face detection. It can identify people with brown hair, smiling, or wearing glasses.

Download Celeb-A Dataset in Python

Instead of downloading the CelebA dataset in Python, you can effortlessly load it in Python via our Deep Lake open-source with just one line of code.

Load CelebA Dataset Training Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/celeb-a-train")
				
			

Load CelebA Dataset Validation Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/celeb-a-val")
				
			

Load CelebA Dataset Testing Subset in Python

				
					import deeplake
ds = deeplake.load("hub://activeloop/celeb-a-test")
				
			

CelebA Dataset Structure

CelebA Data Fields
  • image: tensor containing the 178×218 image.
  • bbox: tensor containing bounding box of their respective images.
  • keypoints: tensor to identify 63 various key points from face
  • clock_shadow: tensor to check cloak shadow.
  • arched_eyebrows: tensor to check arch eyebrows.
  • attractive: tensor to check if attractive or not.
  • bags_under_eyes: tensor to check if bags are under the eyes.
  • bald: tensor to check if bald or not.
  • bangs: tensor to check if bangs are there or not.
  • big_lips: tensor to check if big lips are there or not.
  • big_nose: tensor to check if big nose is there or not.
  • black_hair: tensor to check the presence of black hair.
  • blond_hair: tensor to check if blond hair or not.
  • blurry: tensor to check if the image is blurred.
  • brown_hair: tensor to check the presence of brown hair.
  • bushy_eyebrows: tensor to check the presence of bushy eyebrows.
  • chubby: tensor to check if chubby or not.
  • double_chin: tensor to check the presence of double chin.
  • eyeglasses: tensor checks the presence of eyebrows.
  • goatee: tensor to check the presence of a goatee in a person.
  • gray_hair: tensor to check the presence of gray hair.
  • heavy_makeup: tensor to check the presence of heavy makeup.
  • high_cheekbones: tensor to check the presence of high cheekbones.
  • male: tensor to check if the person is male.
  • mouth_slightly_open: tensor to check if the mouth is open.
  • mustache: tensor to check the presence of a mustache.
  • narrow_eyes: tensor to check narrow eyes or not.
  • no_beard: tensor to check if the beard is present.
  • oval_face: tensor to check if the face is oval.
  • pale_skin: tensor to check if the skin is pale.
  • pointy_nose: tensor to check if the nose is pointy.
  • receding_hairline: tensor to check if the hairline is receding.
  • rosy_cheeks: tensor to check if the cheeks are rosy.
  • sideburns: tensor to check the presence of sideburns.
  • smiling: tensor to check if the person is smiling.
  • straight_hair: tensor to check if the hair is straight.
  • wavy_hair: tensor to check if the hair is wavy.
  • wearing_earrings: tensor to check the presence of earing.
  • wearing_hat: tensor to check the presence of the hat.
  • wearing_lipstick: tensor to check the presence of lipstick.
  • wearing_necklace: tensor to check the presence of the necklace.
  • wearing_necktie: tensor to check the presence of necktie.
  • young: tensor to check if the person is young.
  •  
CelebA Data Splits
  • The CelebA dataset training set is composed of 162,770.
  • The CelebA dataset test set was composed of 19,962.
  • The CelebA dataset val set was composed of 19,867.

How to use CelebA Dataset with PyTorch and TensorFlow in Python

Train a model on CelebA dataset with PyTorch in Python

Let’s use Deep Lake built-in PyTorch one-line dataloader to connect the data to the compute:

				
					dataloader = ds.pytorch(num_workers=0, batch_size=4, shuffle=False)
				
			
Train a model on CelebA dataset with TensorFlow in Python
				
					dataloader = ds.tensorflow()
				
			

Additional Information about CelebA Dataset

CelebA Dataset Description

  • Homepage: https://mmlab.ie.cuhk.edu.hk/projects/CelebA.html
  • Repository: N/A
  • Paper: Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou: Deep Learning Face Attributes in the Wild, Proceedings of International Conference on Computer Vision (ICCV), 2015
  • Point of Contact: ziwei.liu at ntu.edu.sg
CelebA Dataset Curators

Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou

CelebA Dataset Licensing Information

Deep Lake users may have access to a variety of publicly available datasets. We do not host or distribute these datasets, vouch for their quality or fairness, or claim that you have a license to use the datasets. It is your responsibility to determine whether you have permission to use the datasets under their license.

If you’re a dataset owner and do not want your dataset to be included in this library, please get in touch through a GitHub issue. Thank you for your contribution to the ML community!

CelebA Dataset Citation Information
				
					@inproceedings{liu2015faceattributes,
  title = {Deep Learning Face Attributes in the Wild},
  author = {Liu, Ziwei and Luo, Ping and Wang, Xiaogang and Tang, Xiaoou},
  booktitle = {Proceedings of International Conference on Computer Vision (ICCV)},
  month = {December},
  year = {2015} 
}
				
			

CelebA Dataset FAQs

What is the CelebA dataset for Python?

The CelebFaces Attributes Dataset (CelebA) consists of more than 200K celebrity images with 40 attribute annotations each. The images range from extreme poses to heavily background-cluttered backgrounds.

What is the CelebA dataset used for?

This dataset is great for training and testing models for face detection, particularly for recognizing facial attributes such as finding people with brown hair, smiling, or wearing glasses. Images cover large pose variations, background clutter, and diverse people, supported by a large number of images and rich annotations.

How can I use CelebA dataset in PyTorch or TensorFlow?

You can stream the CelebA dataset while training a model in PyTorch or TensorFlow with one line of code using the open-source package Activeloop Deep Lake in Python. See detailed instructions on how to train a model on CelebA dataset with PyTorch in Python or train a model on CelebA dataset with TensorFlow in Python.

Datasets - Previous PPM-100 Dataset Next - Machine Learning Datasets Tensor Relationships
Datasets - Previous PPM-100 Dataset
Leaf Illustration

© 2022 All Rights Reserved by Snark AI, inc dba Activeloop