Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add documentation about how to put a string in progmem #3

Closed
mutantbob opened this issue Mar 24, 2022 · 22 comments
Closed

add documentation about how to put a string in progmem #3

mutantbob opened this issue Mar 24, 2022 · 22 comments
Assignees

Comments

@mutantbob
Copy link

I have spent the afternoon experimenting with how to put a string in progmem.

Using

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE3: &str = "apple";

only puts the pointer and count of the slice in progmem. The actual characters still end up in RAM.

It is possible to use something like

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE6: &[u8; 9] = b"dai kenja";

But if you try

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE5: &[u8; 13] = b"dai 大賢者 kenja";

you get an error:

error: byte constant must be ASCII. Use a \xHH escape for a non-ASCII byte
@mutantbob
Copy link
Author

It is also unclear how to access data from include_bytes!

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE4: &[u8] = include_bytes!("parts1.txt");

seems like it creates a [u8;N] , because core::mem::size_of_val(MESSAGE4) reports 2031 (which is the size of the file). However, trying to unsafe{ read_value( (&MESSAGE4[0] as *const u8).offset(i) ) } does not return values that match what is in parts1.txt .

@Cryptjar
Copy link
Owner

Thanks for bringing up these pain points. This is definitely something that should be mentioned in the documentation.

Generally, you should not store any reference in progmem, because otherwise only the reference is in progmem instead of the actual data. Thus, you should rather use arrays directly, e.g.:

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE6: [u8; 9] = *b"dai kenja";

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE4: [u8; 2031] = *include_bytes!("parts1.txt");

Non-ASCII byte literals, well, seem not to be supported by Rust, so either use the proposed manual UTF-8 encoding or maybe a char array:

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE5_1: [u8; 19] = *b"dai \xE5\xA4\xA7\xE8\xB3\xA2\xE8\x80\x85 kenja";

#[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
static MESSAGE5_2: [char; 13] = ['d', 'a', 'i', ' ', '大', '賢', '者', ' ', 'k', 'e', 'n', 'j', 'a'];

Neither look particularly nice, but as far as I know, there is not much else you can do. Maybe one day something like this could work, but for now this is sadly not const yet:

static MESSAGE5_3: [u8; 19] = "dai 大賢者 kenja".as_bytes().try_into().unwrap();

@Cryptjar Cryptjar self-assigned this Mar 25, 2022
@mutantbob
Copy link
Author

I have vague ideas for a procedural macro that lets you code a string literal and it turns into a wrapper around a progmem byte array. Maybe one day I'll make time to write it if no one beats me to it. Another complication is that ufmt's uWrite is all about &str and doesn't have a byte-by-byte API.

@Cryptjar
Copy link
Owner

Cryptjar commented Apr 2, 2022

Your suggestion to do something with macros, got me intrigue. Admittedly, I mostly use include_bytes and ASCII. But I guess, it would be quite useful, if one can simply specify a "normal" string literal, let it be stored as a byte array, while being able to use it like &str. So the first thing that come to my mind is wrapper struct around some [u8;N] and implementing Deref on it. Such a wrapper struct should work nicely with the progmem macro, something like this:

progmem! {
    static progmem TEXT: ByteString<19> = ByteString::new(
        "dai 大賢者 kenja"
    );
}
// usage:
let text_buffer = TEXT.load(); // The temporary DRAM buffer for `TEXT`
let text: &str = &text_buffer; // Just derefs to `str`

Update:

One can use that to offer convenient in-place string wrapping, e.g.:

fn print(s: &str) {
    // do something with `s`
}
// Put the literal as byte array into progmem and load it here as `&str`
print(progmem_str!("dai 大賢者 kenja"));

Which would be just syntactic sugar for an ad-hoc progmem static and load.

@mutantbob
Copy link
Author

The best of all worlds would be if you can use ufmt to "print" a literal string that would not fit in RAM, but the least awkward way to do that would be to transfer the string byte-by-byte from progmem to the serial port or whatever, and ufmt does not have a byte interface at the moment. Workarounds would include doing 8(?)-byte slices at a time, probably using from_utf8_unchecked.

@Cryptjar
Copy link
Owner

Cryptjar commented Apr 4, 2022

Hmm, reading strings byte-by-byte is an interesting thought. The ByteString type is supposed to be wrapped in a ProgMem struct (i.e. yielding a static with type ProgMem<ByteString>), thus it must be read all at once, but it can be stored on stack in order to deref it as &str (as whole).

In order to support partially reading of a str, I could imagine a different struct, which itself wraps a ProgMem<[u8;N]> — let's call it: PmString. This way, a PmString could give access to arbitrary parts of the str, however, it can no longer deref to &str (because nothing has been loaded that could be referenced).

Also, due to the variable length encoding of UTF-8, reading fixed blocks of bytes (such as 8 at a time) and then using from_utf8_unchecked on it would be undefined behavior. Instead, it must be read literally byte-by-byte until a valid chunk (1-4 bytes) has been read that can be turned into a str or char. Therefore, I imagine that PmString would offer a char-iterator that lazily reads char-by-char from progmem, which seems sufficient for ufmt usage (in fact PmString could directly implement uDisplay using such an iterator).

However, at this point, I wounder whether this byte-wise reading & parsing is detrimental for performance, and whether, using a char array would be better than a UTF-8 byte-array for such a use-case. On the other hand, a char array could require up to 4x as much progmem.

Do you have further thoughts on this?

Update

Actually, it's much harder to get a char array out of a str at compile-time than a byte array. I think, the UTF-8 parsing must suffice.

@Cryptjar
Copy link
Owner

Cryptjar commented Apr 9, 2022

The usage style I implemented so far is:

use avr_progmem::progmem;

// Imagine having something implementing `ufmt::uWrite`
let mut writer = /* snip */;

progmem! {
    // declare a string with the additonal `string` pseudo-keyword.
    // This will wrap the given `&str` in a `PmString`
    static progmem string TEXT = "dai 大賢者 kenja";
}

// Either load at once (potential high RAM usage)
let buffer = TEXT.load();
// and use it as `&str`
writer.write_str(&*buffer).unwrap();

// Or load it lazily (one char at a time)
for c in TEXT.chars() {
    writer.write_char(c).unwrap();
}

// Or use the `uDisplay` impl which is also lazy
ufmt::uformat!(&mut writer, "{}", TEXT).unwrap();

Additionally, for single-use strings, there are two macros for in-line strings:

use avr_progmem::progmem_str as F;
use avr_progmem::progmem_display as D;

// Imagine having something implementing `ufmt::uWrite`
let mut writer = /* snip */;

// In-line progmem string accessible as temporary `&str`
// Needs enough RAM to store the entire string
writer.write_str(F!("dai 大賢者 kenja")).unwrap();

// In-line progmem string accessible as `impl Display + uDisplay`
// Needs only a minimal amount of RAM
ufmt::uwrite!(&mut writer, "{}", D!("dai 大賢者 kenja")).unwrap();

Would this cover your use-case?

@mutantbob
Copy link
Author

I think so. The funniest part is that I wrote some of this last week (the ability to turn [u8;N] into a character iterator).

You even made your own copy of next_code_point too. Hopefully rust-lang/rust#95788 gets merged one day and you will be able to use that instead.

The final transformation will be when the pieces of the format specification to the uwrite! macro are stored in progmem.
For example

uwriteln!(serial, "would you like your {} with a side of {}?", main, side);

would become

ufmt::uDisplay::fmt(D!("would you like your "), serial);
ufmt::uDisplay::fmt(main, serial);
ufmt::uDisplay::fmt(D!(" with a side of "), serial);
ufmt::uDisplay::fmt(side, serial);
ufmt::uDisplay::fmt(D!("?\n"), serial);

which probably isn't exactly how the macro works, but you get the idea. I suspect that most (if not all) of the work in this crate is done and it's just a question of tweaking ufmt (which, given that japaric/ufmt#32 has been sitting there since January, could take a while)

@Cryptjar
Copy link
Owner

I think so.

Nice. I have few more things that I will implement before I will release a new v0.2.0 with all these changes. I hope, I can get it done this week.

You even made your own copy of next_code_point too. Hopefully rust-lang/rust#95788 gets merged one day and you will be able to use that instead.

It'd be interesting whether your next_code_point_val RP gets accepted. I wonder whether it would make more sense to have a function like decode_utf8 (maybe as a safe alternative to next_code_point_val) besides decode_utf16.

The final transformation will be when the pieces of the format specification to the uwrite! macro are stored in progmem.

That looks more complicated, probably needs a proc-macro. I suppose that could be its own crate.

@mutantbob
Copy link
Author

One thing I am fuzzy on, is "how do we check that a particular definition actually ends up in progmem instead of RAM?"
Another thing I want to investigate is "which techniques explode if the string does not fit in RAM?"

I am also having difficulty writing code that uses this.

[dependencies.avr-progmem]
#version = "*"
git="https://github.com/Cryptjar/avr-progmem-rs.git"
branch="v0.2"
#path="../../avr-progmem-rs"
features=["ufmt"]

But I still get stuff like

error[E0277]: the trait bound `PmString<5_usize>: uDisplay` is not satisfied
  --> string1/src/main.rs:12:7
   |
9  | pub fn x<D: ufmt::uDisplay>(blah: &D) {}
   |             -------------- required by this bound in `x`
...
12 |     x(five);
   |       ^^^^ the trait `uDisplay` is not implemented for `PmString<5_usize>`

I can see the code at line 511 of string.rs .

@mutantbob
Copy link
Author

The good news is that

fn exp3<W: uWrite>(serial: &mut W) -> Result<(), <W as uWrite>::Error> {
    progmem! {
        static progmem string LC = include_str!("lovecraft.txt");
    }

    for ch in LC.chars() {
        let _ = serial.write_char(ch)?;
    }
    serial.write_char('\n')?;
    Ok(())
}

works just fine, even though lovecraft.txt is an 11k file which has no chance of fitting in the RAM of an arduino UNO.

@mutantbob
Copy link
Author

OK, I think the uDisplay is because my code tree is using ufmt branch="ptr_width_16_fix" while you are using rev = "e6e574625e89bce9a5cb907eae43bfd45672f768" .

After banging on Cargo.toml with increasingly large rocks, I was able to get it to compile and run with reasonable output.

@Cryptjar
Copy link
Owner

One thing I am fuzzy on, is "how do we check that a particular definition actually ends up in progmem instead of RAM?" Another thing I want to investigate is "which techniques explode if the string does not fit in RAM?"

I am also having difficulty writing code that uses this.

[dependencies.avr-progmem]
#version = "*"
git="https://github.com/Cryptjar/avr-progmem-rs.git"
branch="v0.2"
#path="../../avr-progmem-rs"
features=["ufmt"]

But I still get stuff like

error[E0277]: the trait bound `PmString<5_usize>: uDisplay` is not satisfied
  --> string1/src/main.rs:12:7
   |
9  | pub fn x<D: ufmt::uDisplay>(blah: &D) {}
   |             -------------- required by this bound in `x`
...
12 |     x(five);
   |       ^^^^ the trait `uDisplay` is not implemented for `PmString<5_usize>`

I can see the code at line 511 of string.rs .

That is strange, this should work already.

If I create a fresh crate it works for me:

[package]
name = "test-avr-progmem-3"
version = "0.1.0"
edition = "2018"

[dependencies]
ufmt = "0.1"

[dependencies.avr-progmem]
git="https://github.com/Cryptjar/avr-progmem-rs.git"
branch="v0.2"
#![feature(const_option)]

avr_progmem::progmem! {
	static progmem string FOO = "Foobar";
}

pub fn x<D: ufmt::uDisplay>(blah: &D) {}

fn main() {
    x(&FOO);
}

That compiles just fine for me (you don't even need the ufmt feature, because it is a default feature) with:

cargo +nightly-2021-01-07 build --target avr-atmega328p.json -Z build-std=core

Maybe you need to do a cargo update?

@Cryptjar
Copy link
Owner

OK, I think the uDisplay is because my code tree is using ufmt branch="ptr_width_16_fix" while you are using rev = "e6e574625e89bce9a5cb907eae43bfd45672f768" .

After banging on Cargo.toml with increasingly large rocks, I was able to get it to compile and run with reasonable output.

Yes that looks like it, I mean avr-progmem officially uses the crates.io version of ufmt, so if you use something like:

[dependencies]
ufmt = {git = "https://github.com/mrk-its/ufmt.git", branch = "ptr_width_16_fix"}

that would be incompatible, using this, I get the same error as you.

If you instead patch it, like I did in avr-progmem, it should work again:

[dependencies]
ufmt = "0.1"

[patch.crates-io]
# Fixes formatting of u32
ufmt = { git = "https://github.com/mrk-its/ufmt.git", branch = "ptr_width_16_fix" }

@mutantbob
Copy link
Author

after I harmonized the ufmt crate revisions, this is one of the tests I used:

fn exp4<W: uWrite>(serial: &mut W) -> Result<(), <W as uWrite>::Error> {
    progmem! {
        static progmem string LC = include_str!("lovecraft.txt");
    }

    uwrite!(serial, "{}", LC);
    Ok(())
}

I'm not sure if that would be a good example for the docstrings.

But right now I think the specific issue this ticket was originally about is solved. The code works in my testing (ignoring the rev -vs- branch weirdness that will be irrelevant if ufmt ever merges that fix). The docstrings on the string module look good.

@Cryptjar
Copy link
Owner

I'm not sure if that would be a good example for the docstrings.

If have something to expand the docs, I am open to any PRs.

But right now I think the specific issue this ticket was originally about is solved. The code works in my testing (ignoring the rev -vs- branch weirdness that will be irrelevant if ufmt ever merges that fix). The docstrings on the string module look good.

Good to hear. I'm going to close this issue, when I released the new version on crates.io.

@mutantbob
Copy link
Author

One last thing: the initializer for PmString returns an option, which in turn requires #![feature(const_option)] .

If you ever feel like going nuclear, you might experiment with my procedural macro:

use proc_macro::TokenStream;

use quote::quote;
use syn::parse::{Parse, ParseStream};
use syn::{token::Static, Ident, LitStr, Token};

struct Arguments {
    static_k: Static,
    varname: Ident,
    eq: Token![=],
    string_literal: LitStr,
    semicolon: Token![;],
}

impl Parse for Arguments {
    fn parse(input: ParseStream) -> syn::Result<Self> {
        let static_k = input.parse()?;
        let varname = input.parse()?;
        let eq: Token![=] = input.parse()?;
        let string_literal = input.parse()?;
        let semicolon: Token![;] = input.parse()?;

        Ok(Arguments {
            static_k,
            varname,
            eq,
            string_literal,
            semicolon,
        })
    }
}

#[proc_macro]
pub fn avr_progmem_str(t_stream: TokenStream) -> TokenStream {
    let macro_args = syn::parse_macro_input!(t_stream as Arguments);

    let Arguments {
        static_k,
        varname,
        eq,
        string_literal,
        semicolon,
    } = macro_args;

    let string_value = string_literal.value();
    let string_bytes = string_value.as_bytes();
    let tokens = string_bytes
        .iter()
        .map(|b| quote!(#b , ))
        .collect::<Vec<_>>();
    let count = string_bytes.len();

    quote!(
        #[cfg_attr(target_arch = "avr", link_section = ".progmem.data")]
        #static_k #varname : avr_progmem::ProgMem<[u8;#count]> #eq unsafe { avr_progmem::ProgMem::new([ #(#tokens)* ]) } #semicolon
    )
    .into()
}

its syntax is a little different than yours, but could be adjusted to match.

@Cryptjar
Copy link
Owner

That looks interesting. But I can't just include a proc-macro into the avr-progmem crate, because proc-macros need to be defined in a separate crate. On that notice, I think this macro can be generalized into a macro that just turns arbitrary (Unicode) string literals into fixed-sized byte arrays, e.g. so could be used like this:

let bytes: [u8; 13] = byte_string!("dai 大賢者 kenja");

Having such a macro, would allow using the PmString::from_array constructor, which does not return an Option (the option is only there to match the length of &str or &[u8] to the const generic N on PmString). If you want, you could easily make a crate for such a byte_string macro. I would certainly use it to get rid of the Option::unwrap in my progmem macro, because I don't like the unwrap either, it's just a necessity, because as of now I don't have a better way to turn a string literal into a fixed-sized array.

@mutantbob
Copy link
Author

Until I get around to that... https://github.com/mutantbob/avr-progmem-string/blob/master/src/lib.rs

@mutantbob
Copy link
Author

I added a procedural macro string_as_bytes! to the repo I linked above. I'm not sure it makes things more ergonomic for the end user.

@Cryptjar
Copy link
Owner

I added a procedural macro string_as_bytes! to the repo I linked above. I'm not sure it makes things more ergonomic for the end user.

Well, the nice thing about string_as_bytes! macro is that I can use it in my progmem macro, so that users who only use literal strings no longer need the #![feature(const_option)].

See: 156651d

@Cryptjar
Copy link
Owner

I just released v0.2.0, string handling should now be much easier.
Thanks for your input on this topic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants