Skip to content

Commit

Permalink
Final commit
Browse files Browse the repository at this point in the history
  • Loading branch information
Loara committed Jun 15, 2021
1 parent a61172c commit 9e42b6c
Show file tree
Hide file tree
Showing 8 changed files with 179 additions and 179 deletions.
37 changes: 30 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ C++ library to manage strings and (almost) any kind of encoded data.
Encmetric is written under the GNU Lesser General Public License (LGPL) version 3. For more informations see COPYING and COPYING.LESSER files

# Build and Install
To build the library you need cmake version 3.20 or later and a C++ compiler that supports C++20 concepts (for example `gcc` v. 10.2 or newer).
To build the library you need cmake version 3.20 or later and a C++ compiler that supports C++20 concepts (for example `gcc` v. 11.2 or newer).

To build and install the library you can run these commands:

Expand All @@ -32,14 +32,15 @@ On Arch Linux you can use package `stringsuite` in AUR, additional informations
Clearly you can write and use your own encoding classes.

# Basic usage
## String encodings
## `encmetric` subset
### String encodings
A string encoding class is simply a static class that manages how a string is encoded with that encoding. Some encoding classes provided with StringSuire are `ASCII`, `UTF8`, `UTF16BE` `UTF16LE`, `UTF32BE` `UTF32LE`, `Latin1`/`ISO_8859_1`, `ISO_8859_2`, `KOI8_R`, `KOI8_U`, `KOI8_RU`.

Usually encoding classes should be specified as template arguments of string classes, but you can decide todynamically specify an encoding by using the `WIDEchr` template argument. Remember that if you initialize any string with the `WIDEchr` template argument you should pass a dynamic pointer to `EncMetric<unicode>` class representing your encoding, this can be obtained via the `DynEncoding` template argument. For example a dynamic pointer of `UTF8` encofing can be obtained with

const EncMetric<unicode> *utf8 = DynEncoding<UTF8>::instance();

## String dimensions
### String dimensions
Encoded strings have two different types of lengths:

* **size** is simply the number of bytes occupied by the encoded string, same value returned by `strlen` in C and `std::string::length()`, `std::string::size()` in C++
Expand All @@ -54,7 +55,7 @@ For example consider the UTF8 encoded string `abè€`: it contains exactly 4 ch

Then the size of string `abè€` is exactly 1+1+2+3=7. Strings in StringSuite, despite `std::string`, allow you to detect both the size and the length of any encoded string.

## Strings and string views
### Strings and string views
An **adv_string_view** object is simply a view of an existing character encoded string (like che C `const char *` strings) that doesn't own the pointed data, so copying an adv_string_view doesn't automatically copy also the underlying string. Instead an `adv_string` object is more similar to C++ `std::string` object: it allocates enough space in order to contain its string. Copying and initializying a new adv_string also copy the encoded string, so you should usually use adv_string_view instead of adv_string if you don't need to manipulate your strings.

You can initialize new strings and new string views with `alloc_string` and `new_string_view` functions respectively (or with their constructor if you use gcc 11.2 or later)
Expand All @@ -66,10 +67,32 @@ You can initialize new strings and new string views with `alloc_string` and `new

You can perform all tha basic string operations on an `adv_string_view`/`adv_string` class, for more informations see their class definitions in `strsuite/encmetric/enc_string.hpp` header file.

## String buffer
An **adv_string_buf** is a simple string buffer that allow you to build new strings. It can also perform **encoding conversions** so you should use it in order to perform encoding conversions (for example UTF8 from an UTF16 string).
### String literals
In StrSuite you can build some UTF string views directly from string literals by using the `_asv` suffix. For example

Once you create the desired string you can obtain it with one of the following methods:
* literal `u8"..."_asv` returns an UTF-8 encoded string view (`adv_string_view<UTF8>`);
* literal `u"..."_asv` returns an UTF16SYS encoded string where UTF16SYS may be UTF16LE or UTF16BE depending of endianess of current enfironment;
* literal `U"..."_asv` returns an UTF32SYS encoded string where UTF32SYS may be UTF32LE or UTF32BE depending of endianess of current enfironment.

**Notice** that while by default you can use string literal `"..."_asv` to build ASCII string views some compilers (for example gcc) have the ability of changing the encoding of these narrowed literals (via the option `-fexec-charset=` for example) and currently StringSuite is not able to detect this variation. Use always `u8` literals or use `STS_IO_asv` macro as explained in the sext section.

## `io` subset
### `STS_IO_asv` and `IOenc`
UNIX systems by default works with UTF8 encoded strings whereas Windows uses UTF16 (little endian) encoding (not considering all the Windows codepages). StringSuite provides the macro `STS_IO_asv` in order to build UTF8 literals on UNIX systems and UTF16 literals on Windows.

StringSuite provides also the `IOenc` encoding type alias in order to receive any `STS_IO_asv` string view and to work with basyc system IO streams.

adv_string_view<IOenc> u = STS_IO_asv("Hi");
/*
* equivalent to adv_string_view<UTF8> u = u8"Hi"_asv on UNIX systems
* equivalent to adv_string_view<UTF16LE> u = u"Hi"_asv on Windows systems
*/

### Default stdin, stdout, stderr
You can access console `IOenc` encoded standard streams `stdin, stdout, stderr` by calling respectively `get_console_stdin()`, `get_console_stdout()`, `get_console_stderr()`. For all available operations see also `char_stream.hpp`, `nl_stream.hpp` file headers.

### `string_stream`
An **string_stream** is a simple string buffer that allow you to build new strings defined in `string_stream.hpp` header. Once you create the desired string you can obtain it with one of the following methods:

* `view()`: returns a view of underlying string buffer. **WARNING**: any buffer modification (for example appending new strings) invalidates all instantiated views. Use this function with extreme care;
* `move()`: moves the underlying buffer to a new `adv_string` object. After this operation the buffer will be empty;
Expand Down
5 changes: 3 additions & 2 deletions src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -64,13 +64,14 @@ install(FILES "strsuite/io/enc_io_core.hpp"
"strsuite/io/enc_io_exc.hpp"
"strsuite/io/byte_stream.hpp"
"strsuite/io/char_stream.hpp"
"strsuite/io/string_stream.hpp" DESTINATION include/strsuite/io)
"strsuite/io/string_stream.hpp"
"strsuite/io/nl_stream.hpp" DESTINATION include/strsuite/io)

install(FILES "strsuite/encmetric/chite.tpp"
"strsuite/encmetric/enc_string.tpp"
"strsuite/encmetric/dynstring.tpp" DESTINATION include/strsuite/encmetric)

install(FILES "strsuite/io/char_stream.tpp"
install(FILES "strsuite/io/nl_stream.tpp"
"strsuite/io/string_stream.tpp" DESTINATION include/strsuite/io)

install(FILES "strsuite/encmetric.hpp" DESTINATION include/strsuite)
Expand Down
3 changes: 0 additions & 3 deletions src/strsuite/enc_io.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,6 @@
You should have received a copy of the GNU Lesser General Public License
along with Encmetric. If not, see <http://www.gnu.org/licenses/>.
*/
#include <strsuite/encmetric/dynstring.hpp>
#include <strsuite/encmetric/all_enc.hpp>
#include <strsuite/encmetric/config.hpp>
#include <strsuite/io/enc_io_core.hpp>
#include <type_traits>

Expand Down
110 changes: 0 additions & 110 deletions src/strsuite/io/char_stream.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -124,116 +124,6 @@ class CharOStream{
const EncMetric<typename T::ctype> *format() const noexcept{ return raw_format().format();}
};

template<general_enctype T>
class NewlineIStream : public CharIStream<T>{
protected:
virtual adv_string_view<T> do_newline() const noexcept=0;
virtual index_result do_is_endl(const byte *b, size_t siz)const noexcept{
auto nl = do_newline();
if(siz < nl.size())
return index_result{false, 0};
const byte *last = b + (siz - nl.size());
return index_result{std::memcmp(last, nl.data(), nl.size()) == 0, nl.size()};
}
virtual adv_string<T> do_getline(std::pmr::memory_resource *all){
basic_ptr ptda{all};
size_t retsiz=0, retlen=0;
tchar_pt<T> base{ptda.memory, this->raw_format()};
tchar_relative<T> to{base};
adv_string_view<T> nl = do_newline();
bool endl = false;
while(!endl){
try{
uint chl = this->char_read(to.convert(), ptda.dimension - retsiz);
to.next(ptda.dimension - retsiz);
retsiz += chl;
retlen++;
endl = do_is_endl(ptda.memory, retsiz).success;
}
catch(IOBufsmall &bs){
ptda.exp_fit(ptda.dimension + bs.get_required_size());
base = base.new_instance(ptda.memory);
}
}
return direct_build_dyn<T>(std::move(ptda), retlen, retsiz, this->raw_format());
}
public:
adv_string_view<T> newline() const noexcept{return do_newline();}
index_result is_endl(const byte *b, size_t siz)const noexcept{ return do_is_endl(b, siz);}
adv_string<T> getline(std::pmr::memory_resource *all=std::pmr::get_default_resource()){ return do_getline(all);}
};

template<general_enctype T>
class NewlineOStream : public CharOStream<T>{
protected:
virtual adv_string_view<T> do_newline() const noexcept=0;
virtual index_result do_is_endl(const byte *b, size_t siz)const noexcept{
auto nl = do_newline();
if(siz < nl.size())
return index_result{false, 0};
const byte *last = b + (siz - nl.size());
return index_result{std::memcmp(last, nl.data(), nl.size()) == 0, nl.size()};
}
virtual size_t do_putnl(){
auto nl = do_newline();
return this->do_string_write(nl);
}
public:
adv_string_view<T> newline() const noexcept{return do_newline();}
index_result is_endl(const byte *b, size_t siz)const noexcept{ return do_is_endl(b, siz);}
size_t putNL(){ return do_putnl();}
size_t endl(){
auto ret = this->do_putnl();
this->do_flush();
return ret;
}
template<general_enctype S>
size_t print(const adv_string_view<S> &str){
return this->string_write(str);
}
template<general_enctype S>
size_t println(const adv_string_view<S> &str){
size_t par = this->string_write(str);
par += endl();
return par;
}
};

/*
* Threat bytestreams as char streams
template<general_enctype T>
class C_B_IStream : public CharIStream<T>{
private:
raw_buf c_buffer;
ByteIStream *input;
EncMetric_info<T> enc;
protected:
virtual uint do_char_read(tchar_pt<T> pt, size_t buf);
virtual void do_close() {input->close();}
virtual void do_flush() {
input->flush();
c_buffer.raw_clear();
}
virtual EncMetric_info<T> do_encmetric() const noexcept {return enc;}
public:
C_B_IStream(ByteIStream *in, EncMetric_info<T> e, std::pmr::memory_resource *all = std::pmr::get_default_resource()) : c_buffer{all}, input{in}, enc{e} {}
C_B_IStream(ByteIStream *in, std::pmr::memory_resource *all = std::pmr::get_default_resource()) requires strong_enctype<T> : C_B_IStream{in, EncMetric_info<T>{}, all} {}
C_B_IStream(ByteIStream *in, const EncMetric<typename T::ctype> *e, std::pmr::memory_resource *all = std::pmr::get_default_resource()) requires widenc<T> : C_B_IStream{in, EncMetric_info<T>{e}, all} {}
};
template<general_enctype T>
class C_B_OStream : public CharOStream<T>{
protected:
virtual uint do_char_write(const_tchar_pt<T>, size_t);
virtual void do_close() {input->close();}
virtual void do_flush() {input->flush();}
virtual EncMetric_info<T> do_encmetric() const noexcept {return enc;}
};
*/
#include <strsuite/io/char_stream.tpp>
}


Expand Down
56 changes: 0 additions & 56 deletions src/strsuite/io/char_stream.tpp

This file was deleted.

2 changes: 1 addition & 1 deletion src/strsuite/io/enc_io_core.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
#include <strsuite/encmetric/config.hpp>
#include <strsuite/encmetric/all_enc.hpp>
#include <strsuite/io/enc_io_exc.hpp>
#include <strsuite/io/char_stream.hpp>
#include <strsuite/io/nl_stream.hpp>

namespace sts{

Expand Down
66 changes: 66 additions & 0 deletions src/strsuite/io/nl_stream.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#pragma once
/*
This file is part of Encmetric.
Copyright (C) 2021 Paolo De Donato.
Encmetric is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
Encmetric is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License
along with Encmetric. If not, see <http://www.gnu.org/licenses/>.
*/
#include <strsuite/io/string_stream.hpp>

namespace sts{

template<general_enctype T>
class NewlineIStream : public CharIStream<T>{
protected:
virtual adv_string_view<T> do_newline() const noexcept=0;
virtual index_result do_is_endl(const byte *b, size_t siz)const noexcept;
virtual adv_string<T> do_getline(std::pmr::memory_resource *all);
public:
adv_string_view<T> newline() const noexcept{return do_newline();}
index_result is_endl(const byte *b, size_t siz)const noexcept{ return do_is_endl(b, siz);}
adv_string<T> getline(std::pmr::memory_resource *all=std::pmr::get_default_resource()){ return do_getline(all);}
};

template<general_enctype T>
class NewlineOStream : public CharOStream<T>{
protected:
virtual adv_string_view<T> do_newline() const noexcept=0;
virtual index_result do_is_endl(const byte *b, size_t siz)const noexcept;
virtual size_t do_putnl();
public:
adv_string_view<T> newline() const noexcept{return do_newline();}
index_result is_endl(const byte *b, size_t siz)const noexcept{ return do_is_endl(b, siz);}
size_t putNL(){ return do_putnl();}
size_t endl(){
auto ret = this->do_putnl();
this->do_flush();
return ret;
}
template<general_enctype S>
size_t print(const adv_string_view<S> &str){
return this->string_write(str);
}
template<general_enctype S>
size_t println(const adv_string_view<S> &str){
size_t par = this->string_write(str);
par += endl();
return par;
}
};

#include <strsuite/io/nl_stream.tpp>
}



Loading

0 comments on commit 9e42b6c

Please sign in to comment.