Skip to content

nmehran/charex

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

charex

String Array Extensions for Numba

Enhance Numba with NumPy's string processing features by importing charex:

import charex

Comparison operations:

  • char.equal
  • char.not_equal
  • char.greater_equal
  • char.less_equal
  • char.greater
  • char.less
  • char.compare_chararrays

Occurrence and Property information:

  • char.count
  • char.endswith
  • char.startswith
  • char.find
  • char.rfind
  • char.index
  • char.rindex
  • char.str_len
  • char.isalpha
  • char.isalnum
  • char.isspace
  • char.isdecimal
  • char.isdigit
  • char.isnumeric
  • char.istitle
  • char.isupper
  • char.islower

Includes support for UTF-32 strings and ASCII bytes on contiguous arrays of 1-dimension and scalars.

Benchmarks

Despite a minor initial overhead from Numba's LLVM initialization, charex offsets this with increased data scale, outperforming NumPy in handling occurrence and property information.

Comparison Operators

comparison-operators-bytes.png comparison-operators-strings.png

Occurrence Information

char-occurrence-bytes.png char-occurrence-strings.png

Property Information

char-properties-bytes.png char-properties-strings.png char-numerics-strings.png

The benchmarks are generated during testing using charex/tests/test_comparison.py and charex/tests/test_string_information.py.

Last tested 2024-02-23: Numba 0.59.0, NumPy 1.26.3

About

String Array Extensions for Numba

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages